Pages

Friday, January 15, 2021

CNN vs NN : FASHION MNIST

In this blog we will compare Neural Network (NN) and Convolutional Neural Network (CNN) on Fashion MNIST dataset. Basically comparison of NN and CNN based on model convergence, model accuracy after N epoch and inference on translated images. We will also verify CNN translation invariant property.

You can read more about classification network training, transfer learning and inference in my previous blogs Image Classification in Keras and VGG-16 Inference with different image dimension.

Let's get started with this blog.

Dataset

Fashion MNIST dataset have 60,000 training images and 10,000 testing images. Each image is a 28x28 grayscale image, associated with a label from 10 classes.

One can check this google colab Notebook to follow this blog.

Import

First let's import required libraries.

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import cv2
 

Load Dataset

Load Fashion MNIST dataset directly from tensorflow default datasets.
mnist = tf.keras.datasets.fashion_mnist
(training_images,training_labels),(test_images,test_labels)=mnist.load_data()

These are the 10 classes of Fashion MNIST.

classes =['T-shirt/top', 'Trouser', 'Pullover', 'Dress',
'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot' ]

After loading training and testing dataset, let's visualize few training images.

fig=plt.figure(figsize=(8, 8))
row = 3; col = 4
for i in range(row*col):
  fig.add_subplot(row, col, i+1).set_title(
    str(classes[training_labels[i]]))   plt.imshow(training_images[i]) plt.show()

Fig. Sample training images (28x28x1)
Data Preprocessing

For model training we always want datasets to be normalized in -1 to 1 or 0 to 1 range. As normalize data help in model convergence. So let's normalize images by dividing them by 255.

training_images, test_images = training_images/255.0, test_images/255.0

Network Definition

Now we will define NN and CNN network. In NN we are having two hidden Dense layer with 256 units each and one output Dense layer with 10 units for 10 classes.

def get_NN_model():
  model = tf.keras.models.Sequential()
  model.add(tf.keras.layers.Flatten(input_shape=(28,28)))
  model.add(tf.keras.layers.Dense(256, activation= 'relu'))
  model.add(tf.keras.layers.Dense(256, activation= 'relu'))
  model.add(tf.keras.layers.Dense(10, activation= 'softmax'))
  model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
  return model
  
In CNN network we are have three Conv2D layer with 64, 128 and 128 kernel respectively followed by MaxPooling layer and one dense layer for output.
def get_CNN_model():
  model = tf.keras.models.Sequential()
  model.add(tf.keras.layers.Conv2D(64,(3,3), input_shape=(28,28,1), padding = 'same', activation = 'relu'))
  model.add(tf.keras.layers.MaxPooling2D((2,2)))
  model.add(tf.keras.layers.Conv2D(128,(3,3), padding = 'same', activation = 'relu'))
  model.add(tf.keras.layers.MaxPooling2D((2,2)))
  model.add(tf.keras.layers.Conv2D(128,(3,3), padding = 'same', activation = 'relu'))
  model.add(tf.keras.layers.GlobalAveragePooling2D())
  # model.add(tf.keras.layers.Flatten())
  # model.add(tf.keras.layers.Dense(256, activation = 'relu'))
  model.add(tf.keras.layers.Dense(10, activation = 'softmax'))
  model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
  return model

We will train both network for 30 epochs and then compare their accuracies, if any network reaches 99% accuracy before 30 epochs then we will stop training using below callback function.

class epoch_callback(tf.keras.callbacks.Callback):
  def on_epoch_end(self, epoch, logs={}):
    if(logs.get('accuracy') >= 0.99):
      self.model.stop_training = True
Let's create both network and print their summary.
nn_model = get_NN_model()
print(nn_model.summary())
cnn_model = get_CNN_model()
print(cnn_model.summary())

Model Summary

Note the number of learning parameters for both the network are in same range, not much difference and using GlobalAveragePooling layer in CNN network has reduced significant number of learning parameter.


#NN model summary
Layer (type)                 Output Shape              Param #   
=================================================================
flatten (Flatten)            (None, 784)               0         
_________________________________________________________________
dense (Dense)                (None, 256)               200960    
_________________________________________________________________
dense_1 (Dense)              (None, 256)               65792     
_________________________________________________________________
dense_2 (Dense)              (None, 10)                2570      
=================================================================
Total params: 269,322
Trainable params: 269,322
Non-trainable params: 0

#CNN model summary
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 28, 28, 64)        640       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 14, 14, 64)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 14, 14, 128)       73856     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 7, 7, 128)         0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 7, 7, 128)         147584    
_________________________________________________________________
global_average_pooling2d (Gl (None, 128)               0         
_________________________________________________________________
dense_3 (Dense)              (None, 10)                1290      
=================================================================
Total params: 223,370
Trainable params: 223,370
Non-trainable params: 0

Training NN & CNN

Let's train both network for 30 epochs using tensorflow fit function.

filepath = '/content/nn_model.h5'
callback = epoch_callback()
checkpoint = tf.keras.callbacks.ModelCheckpoint(filepath,
    monitor='val_loss', save_best_only=True, mode='auto')

history_nn = nn_model.fit(
    training_images,
    training_labels,
    validation_data=(test_images, test_labels),
    epochs=30,
    callbacks=[checkpoint,callback])

Above code will train the NN with around 95% training and 88% validation accuracy.

To train the CNN we have to increase the image dimension to add image channel information as Conv2D layer of above CNN network takes 28x28x1 dimension images.
training_images_cnn = np.expand_dims(training_images, axis=3)
test_images_cnn = np.expand_dims(test_images, axis=3)
print(training_images_cnn.shape, test_images_cnn.shape)
Next, train CNN network for 30 epochs.
filepath = '/content/cnn_model.h5'
callback = epoch_callback()
checkpoint = tf.keras.callbacks.ModelCheckpoint(filepath,
    monitor='val_loss', save_best_only=True, mode='auto')
history_cnn = cnn_model.fit(
    training_images_cnn,
    training_labels,
    validation_data=(test_images_cnn, test_labels),
    epochs=30,
    callbacks = [checkpoint,callback])
Plot training history for both the network.
def plot_model_history(history, title):
  plt.plot(history.history['accuracy'])
  plt.plot(history.history['val_accuracy'])
  plt.title(title)
  plt.ylabel('accuracy')
  plt.xlabel('epoch')
  plt.legend(['train', 'val'], loc='upper left')
  plt.show()

plot_model_history(history_nn, 'NN')
plot_model_history(history_cnn, 'CNN')
From below plot we can see training accuracy for both network keep increasing but validation accuracy for NN network saturate after 88% where as CNN achieve 92% accuracy for same number of epoch. 
NN epoch vs accuracy
CNN epoch vs accuracy

Model Evaluation

Evaluate best model of both the network on test data to verify accuracy and loss.
nn_model.load_weights('/content/nn_model.h5')
cnn_model.load_weights('/content/cnn_model.h5')
evaluation_nn = nn_model.evaluate(test_images, test_labels)
evaluation_cnn = cnn_model.evaluate(test_images_cnn, test_labels)

print('NN loss : ', evaluation_nn[0], ' accuracy: ', evaluation_nn[1], 'on test data.')
print('CNN loss : ', evaluation_cnn[0], ' accuracy: ', evaluation_cnn[1], 'on test data.')
313/313 [==============================] - 1s 2ms/step - loss: 0.3344 - accuracy: 0.8767
313/313 [==============================] - 1s 2ms/step - loss: 0.2258 - accuracy: 0.9212
NN loss :  0.3343818485736847  accuracy:  0.8766999840736389 on test data.
CNN loss :  0.22580283880233765  accuracy:  0.9211999773979187 on test data.
Till now we have seen that CNN model converges faster than NN and achieve better training and validation accuracy for same number of epochs.

Inference

Now let's see how CNN and NN performs on test images and on slightly translated and bigger images. We will create a new image 38x38 and copy original image on top corner or bottom corner of new image and then resizing it to 28x28 for inference.

Let's define a common function for prediction, it takes model and image and return predicted class and probability for image.
def predict(model, img_nn):
  pred = model.predict(img_nn)
  cls_id = np.argmax(pred)
  conf = pred[0][cls_id]
  cls_name = classes[cls_id]
  return cls_name, conf
Below is the preprocessing function for original image. It return different dimensions image for NN and CNN, as NN needs 28x28 (or 28x28x1) and CNN needs 1x28x28x1 dimension images at inference time.
def preprocess_img(img):
  img_nn = np.expand_dims(img, axis=0)
  img_cnn = np.expand_dims(img_nn, axis=3)
  return img_nn, img_cnn
Define a function for creating new image for NN and CNN, which will be the translated form of test images.
def translate_object(img):
  new_img = np.zeros((38,38))
  if(np.random.randint(2)):
    new_img[10:38,10:38] = img
  else:
    new_img[0:28,0:28] = img
 
  new_img = cv2.resize(new_img, (28,28))
  plt.imshow(new_img)
  plt.show()
  new_img =np.expand_dims(new_img, axis=0)
  img_nn = np.copy(new_img)
  img_cnn =np.expand_dims(new_img, axis=3)
  return img_nn, img_cnn
Finally let's process few images and it's translated version from the test dataset and see prediction from NN & CNN network.
for i in range(len(test_images)):
 
  img = test_images[i]
  img_nn, img_cnn = preprocess_img(img)
  plt.imshow(img)
  plt.show()
  print('GT-class: ', classes[test_labels[i]])
  print('NN on org Image ', predict(nn_model, img_nn))
  print('CNN on org Image ', predict(cnn_model, img_cnn))
 
  img_nn, img_cnn = translate_object(img)

  predict(nn_model, img_nn)
  predict(cnn_model, img_cnn)
  print('NN on transform Image ', predict(nn_model, img_nn))
  print('CNN on transform Image ', predict(cnn_model, img_cnn))

  print('\n**************************************************************\n')
  if(i == 10):
    break
  #break 

Result 


For above image we can see NN and CNN predicted correct class but when same image underwent some spatial translation neural network predict wrong class whereas CNN predict correctly, however we can see drop in confidence value for the class from 0.99 to 0.42.

Conclusion 

  • Convolutional neural network converges faster than Neural network and achieve better accuracy on test dataset.
  • CNN perform better than NN on translated form of original image. 
That's all for this blog. Code is available on Github for experiment.
Thanks!!

Tuesday, January 12, 2021

Neural Network in TF2 : Fitting Linear and Quadratic Curve

In this blog we will see how to train a simple neural network to fit a linear function y = Mx + C, where M is 3 and C is 0.5 and a quadratic function y = x2.

First, we will train a Neural Network to learn the value of M and C and in the end we will compare the model weights with M and C.

One can easily use this Google Colab Notebook on web browser to follow this blog and train given neural network for linear function.

1. Import required packages

2. Define a linear function

3. Lets generate some data for the above function and split data for training and testing. You can normalize the data for faster training.

4. Visualize the training and testing data using matplotlib

5. Lets create a simple neural network to learn the above function. This neural network have only input and output layer, no hidden layer. As this don't have hidden layer so relation between input and output will be y = w*x + b, where w is weight and b is bias for the input layer. Model have only two learning parameter, verify from the model summary.

6. Now we will train the above network for 1500 epochs, if you add hidden layer then model will converge in less epochs.

Training loss become negligible.

7. Evaluate model on test data. 

Loss on test data is similar to training loss, means model is not overfitting.

8. Check generalization of model. 

Let's check model prediction and ground truth for some random data.

Model is performing very well on the data outside of our training data range.

9. Visualize prediction on the test data.

10. We have seen model is performing very well let's check it's learned weights and compare with the original linear function coefficients.

As we know this model have only two weights and we can see these learned weights are almost equal to our M and C value.

It is quite easy to train a neural network for linear function but quite difficult for quadratic function. 

Here is the Notebook for above code, open in google colab, modify the neural network and data for quadratic function and try to train. You will find out the challenges like finding optimum number of hidden layer / unit in hidden layer and many more.

Try to train for quadratic function or just check this Notebook for same.

That's all for this blog. 

Thank you !!