To begin with a machine learning, we can start from very basic and optimize our neural network step by step. Keras offers good datasets, here we’ll be using MNIST digits classification dataset. Keras official page gives good examples and how to use it, you might want to check it out for future usage.
What is MNIST dataset?
MNIST stands for Modified National Institute of Standards and Technology dataset. We’ll be using MNIST digits classification dataset. This is a dataset of 60,000 28x28 grayscale images of the 10 digits, along with a test set of 10,000 images.
Load the data
To use data in neural network, we need to extract features, transform and load (ETL). Below code will show what’s needed for this dataset.
Above code shows each layers have neurons connected to each pixel of images. Normally value of pixels are normalized in between `[0,1]` (pixel value is divided by 255).
Output is one class among 10 classes.
Last layer has single layer of activation function softmax
. Softmax is an activation function of a neural network to normalize the output of a network to a probability distribution over predicted output classes. it normalizes values into 0 ~ 1
. You can find more about activation functions here.
Build the model
Before building model, I’d always make some callback functions that can optimize notebook. Here I’ll show one example to create tensor board.
If you’re using local machine, you should change path for logs.
Now let’s build our first model
I’d personally like to import model and layers so I can just type model.add
It’s up to you how to write your code.
So, model is defined and we need to compile model, there are few things needed to be done:
- Need optimizer (https://www.tensorflow.org/api_docs/python/tf/keras/optimizers)
- Need loss functions (https://www.tensorflow.org/api_docs/python/tf/keras/losses)
- Evaluate trained model
There are few loss functions of my favorite.
- MSE
- binary_crossentropy
- categorical_crossentropy
Metrics are:
- Accuracy : Calculates how often predictions equal labels.
- Precision: Computes the precision of the predictions with respect to the labels.
- and many more (https://www.tensorflow.org/api_docs/python/tf/keras/metrics)
Metrics are only used in evaluation of model, not in training. Loss functions are used to optimize neural network however, metrics are used to determine performance of neural network.
Compile model
We’ve created our first Neural network. Let’s compile this and see what’s inside.
SGD (Stochastic Gradient Descent) optimizer is used to reduce the loss of each epoch. Few variables can be called:
epochs
is that model has gone through entire train dataset.- ‘batch_size’ is number of training instance in a single batch
Code above will compile and reveal its summary how model is consisted of, we’ve not put any more than single dense layer.
Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= dense_layer (Dense) (None, 10) 7850 ================================================================= Total params: 7,850 Trainable params: 7,850 Non-trainable params: 0 _________________________________________________________________
To train model in tensorflow is fairly easy, simply call fit
.
Train the model
Now we’ve got our model ready to be trained, let’s train it. This process will take around 3~5min so go grab a coffee.
It will output something like this:
Epoch 195/200
375/375 [==============================] - 1s 2ms/step - loss: 0.2779 - accuracy: 0.9228 - val_loss: 0.2765 - val_accuracy: 0.9231
Epoch 196/200
375/375 [==============================] - 1s 2ms/step - loss: 0.2795 - accuracy: 0.9221 - val_loss: 0.2764 - val_accuracy: 0.9230
Epoch 197/200
375/375 [==============================] - 1s 2ms/step - loss: 0.2659 - accuracy: 0.9269 - val_loss: 0.2764 - val_accuracy: 0.9226
Epoch 198/200
375/375 [==============================] - 1s 2ms/step - loss: 0.2742 - accuracy: 0.9252 - val_loss: 0.2764 - val_accuracy: 0.9234
Epoch 199/200
375/375 [==============================] - 1s 2ms/step - loss: 0.2750 - accuracy: 0.9220 - val_loss: 0.2762 - val_accuracy: 0.9232
Epoch 200/200
375/375 [==============================] - 1s 2ms/step - loss: 0.2769 - accuracy: 0.9216 - val_loss: 0.2762 - val_accuracy: 0.9236<keras.callbacks.History at 0x7feac26f2310>
You can see each epoch passes its accuracy is increasing.
Evaluate our model
Now train is done, time to evaluate with remaining 10,000 dataset we left as test data.
For mine I’ve got “Test accuracy: 0.9117000007629395”. This means there is an incorrect classification in every 10 images given.
Let’s see how our model has been trained. Simply call %tensorboard
Red is train and Blue is validation
Conclusion
It’s not bad but we can improve this model.
In the next post we’ll try to add hidden layer to current model and how it will improve our model.
You can access full notebook for this post here.