Machine learning with Tensorflow — Fundamentals Convolution Neural Network

4 min readAug 17, 2021

Convolution neural network (CNN) is very suitable for image classification due to its use of spatial information. This neural network was driven from receptive fields in in the visual cortex.

*Figure 1. Deep Learning with Keras by Antonio Gulli*

There are 3 main factors in CNN.

Local receptive field
Shared weights and biases
Pooling

Local receptive field

If you want to save spatial information from images or other shaped data, it’s easier to save them as matrix. To encode local structure into matrix is to connect its nearest input neurons’ submatrices to a hidden neuron in next layer. This hidden neuron shows a local receptive field and this action is called “Convolution” and with neural network consists of this job is called convolution neural network. In addition, submatrices can be overlapped to encode more information in it. How it operates we’ll see later with matrix sliding window function.

Shared weights and biases

By sharing weights and biases, each layers learn the weights to detect feature at different parts of the image. This means that all the neurons in the layer detect exactly the same feature, just at different locations in the input. These weights that define the feature map are also called kernels or filters. One or more functional maps are required to perform image recognition. So a convolutional layer consists of several different feature maps.

Mathematical explanation

The best way to understand what convolution is, to see how sliding window function applied to matrix works. From next image, given input matrix I and K we get convoluted output. 3x3 kernel K (aka filter or feature detector) is multiplied to each element of input matrix and calculated a cell.

Here we decided to stop sliding window when it hits edge of I. Therefore output is 3x3. We can fill input with zeros (output is 5x5). This is decided depending on padding. Kernel depth is same as input depth (channel).

Also, there’s feature called “stride”, it is about how far sliding window is sliding. Bigger stride, output size is smaller, smaller stride, more output is made and contains more information.

Filter size, stride, and padding is used to optimize hyper-parameter for neural network.

How to start with CNN

To write 32 parallel features and 3x3 filter size convolution layer in Tensorflow 2 we start from this:

Single input filter 28x28 image applied 3x3 convolution, outputs 32 channel. This can be seen in Fig. 1 convolutions layer.

Pooling

Pooling layer is used to reduce dimensions of the feature maps to reduce the amount of parameters and computation performed in the network. Pooling layer operates on each feature map independently. There are two types of pooling used:

Max pooling
Average pooling

Max pooling is the most general selection, it simply outputs maximum activation. In keras to perform 2x2 size max pooling layer is

Average pooling outputs average activation in given area. Keras has many different level of poolings. You can check it on Keras official page.

Summary

We’ve looked into basic concept of Convolution Neural Network. CNN operates convolution and pooling differently depending on dimension of information of input.

Audio and text data(time) — 1st dimension
Image(height*width).— 2nd dimension
Video(height*width*time) — 3rd dimension

Examples of CNN : LeNet

Yann LeCun et al. have suggested the idea of constitutional neural network in 1989. Which was to train on MNIST digit dataset, its main idea is using max-pooling to compute constitutional computation in low layers. Convolution is done with shared weights local receptive field. With hidden layers and output layer with softmax function, multi layer perceptron is connected fully into neural network. We can try write its code on notebook:

This part is where we define our CNN. You can see we’ve got 2D image of 28x28 and output is one of 10 digits.

Outcome of our model was, 99.99% in train, 96.84 in validation and 97.2% with test set.

There are many more ways to configure your CNN, we’ve just looked at one example LeNet. We could improve our model by adding or removing layers, I’d recommend adding BatchNormalization or Dropout to see how our model will perform.

You can find whole notebook here.