Perseus AI: January 2021

In this blog we will compare Neural Network (NN) and Convolutional Neural Network (CNN) on Fashion MNIST dataset. Basically comparison of NN and CNN based on model convergence, model accuracy after N epoch and inference on translated images. We will also verify CNN translation invariant property.

You can read more about classification network training, transfer learning and inference in my previous blogs Image Classification in Keras and VGG-16 Inference with different image dimension.

Let's get started with this blog.

Dataset

Fashion MNIST dataset have 60,000 training images and 10,000 testing images. Each image is a 28x28 grayscale image, associated with a label from 10 classes.

One can check this google colab Notebook to follow this blog.

Import

First let's import required libraries.

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import cv2

Load Dataset

Load Fashion MNIST dataset directly from tensorflow default datasets.

mnist = tf.keras.datasets.fashion_mnist
(training_images,training_labels),(test_images,test_labels)=mnist.load_data()

These are the 10 classes of Fashion MNIST.

classes =['T-shirt/top', 'Trouser', 'Pullover', 'Dress',
'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot' ]

After loading training and testing dataset, let's visualize few training images.

fig=plt.figure(figsize=(8, 8))
row = 3; col = 4
for i in range(row*col):
  fig.add_subplot(row, col, i+1).set_title(
    str(classes[training_labels[i]]))
  plt.imshow(training_images[i])
plt.show()

Fig. Sample training images (28x28x1)
Data Preprocessing

For model training we always want datasets to be normalized in -1 to 1 or 0 to 1 range. As normalize data help in model convergence. So let's normalize images by dividing them by 255.

training_images, test_images = training_images/255.0, test_images/255.0

Network Definition

Now we will define NN and CNN network. In NN we are having two hidden Dense layer with 256 units each and one output Dense layer with 10 units for 10 classes.

def get_NN_model():
  model = tf.keras.models.Sequential()
  model.add(tf.keras.layers.Flatten(input_shape=(28,28)))
  model.add(tf.keras.layers.Dense(256, activation= 'relu'))
  model.add(tf.keras.layers.Dense(256, activation= 'relu'))
  model.add(tf.keras.layers.Dense(10, activation= 'softmax'))
  model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
  return model

In CNN network we are have three Conv2D layer with 64, 128 and 128 kernel respectively followed by MaxPooling layer and one dense layer for output.

def get_CNN_model():
  model = tf.keras.models.Sequential()
  model.add(tf.keras.layers.Conv2D(64,(3,3), input_shape=(28,28,1), padding = 'same', activation = 'relu'))
  model.add(tf.keras.layers.MaxPooling2D((2,2)))
  model.add(tf.keras.layers.Conv2D(128,(3,3), padding = 'same', activation = 'relu'))
  model.add(tf.keras.layers.MaxPooling2D((2,2)))
  model.add(tf.keras.layers.Conv2D(128,(3,3), padding = 'same', activation = 'relu'))
  model.add(tf.keras.layers.GlobalAveragePooling2D())
  # model.add(tf.keras.layers.Flatten())
  # model.add(tf.keras.layers.Dense(256, activation = 'relu'))
  model.add(tf.keras.layers.Dense(10, activation = 'softmax'))
  model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
  return model

We will train both network for 30 epochs and then compare their accuracies, if any network reaches 99% accuracy before 30 epochs then we will stop training using below callback function.

class epoch_callback(tf.keras.callbacks.Callback):
  def on_epoch_end(self, epoch, logs={}):
    if(logs.get('accuracy') >= 0.99):
      self.model.stop_training = True

Let's create both network and print their summary.

nn_model = get_NN_model()
print(nn_model.summary())
cnn_model = get_CNN_model()
print(cnn_model.summary())

Model Summary

Note the number of learning parameters for both the network are in same range, not much difference and using GlobalAveragePooling layer in CNN network has reduced significant number of learning parameter.


#NN model summary
Layer (type)                 Output Shape              Param #   
=================================================================
flatten (Flatten)            (None, 784)               0         
_________________________________________________________________
dense (Dense)                (None, 256)               200960    
_________________________________________________________________
dense_1 (Dense)              (None, 256)               65792     
_________________________________________________________________
dense_2 (Dense)              (None, 10)                2570      
=================================================================
Total params: 269,322
Trainable params: 269,322
Non-trainable params: 0


#CNN model summary
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 28, 28, 64)        640       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 14, 14, 64)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 14, 14, 128)       73856     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 7, 7, 128)         0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 7, 7, 128)         147584    
_________________________________________________________________
global_average_pooling2d (Gl (None, 128)               0         
_________________________________________________________________
dense_3 (Dense)              (None, 10)                1290      
=================================================================
Total params: 223,370
Trainable params: 223,370
Non-trainable params: 0

Training NN & CNN

Let's train both network for 30 epochs using tensorflow fit function.

filepath = '/content/nn_model.h5'
callback = epoch_callback()

checkpoint = tf.keras.callbacks.ModelCheckpoint(filepath,
    monitor='val_loss', save_best_only=True, mode='auto')

history_nn = nn_model.fit(
    training_images, 
    training_labels, 
    validation_data=(test_images, test_labels), 
    epochs=30, 
    callbacks=[checkpoint,callback])

Above code will train the NN with around 95% training and 88% validation accuracy.

To train the CNN we have to increase the image dimension to add image channel information as Conv2D layer of above CNN network takes 28x28x1 dimension images.

training_images_cnn = np.expand_dims(training_images, axis=3)
test_images_cnn = np.expand_dims(test_images, axis=3)
print(training_images_cnn.shape, test_images_cnn.shape)

Next, train CNN network for 30 epochs.

filepath = '/content/cnn_model.h5'
callback = epoch_callback()

checkpoint = tf.keras.callbacks.ModelCheckpoint(filepath,
    monitor='val_loss', save_best_only=True, mode='auto')

history_cnn = cnn_model.fit(
    training_images_cnn, 
    training_labels, 
    validation_data=(test_images_cnn, test_labels),
    epochs=30, 
    callbacks = [checkpoint,callback])

Plot training history for both the network.

def plot_model_history(history, title):
  plt.plot(history.history['accuracy'])
  plt.plot(history.history['val_accuracy'])
  plt.title(title)
  plt.ylabel('accuracy')
  plt.xlabel('epoch')
  plt.legend(['train', 'val'], loc='upper left')
  plt.show()

plot_model_history(history_nn, 'NN')
plot_model_history(history_cnn, 'CNN')

From below plot we can see training accuracy for both network keep increasing but validation accuracy for NN network saturate after 88% where as CNN achieve 92% accuracy for same number of epoch.

NN epoch vs accuracy

CNN epoch vs accuracy

Model Evaluation

Evaluate best model of both the network on test data to verify accuracy and loss.

nn_model.load_weights('/content/nn_model.h5')
cnn_model.load_weights('/content/cnn_model.h5')

evaluation_nn = nn_model.evaluate(test_images, test_labels)
evaluation_cnn = cnn_model.evaluate(test_images_cnn, test_labels)

print('NN loss : ', evaluation_nn[0], ' accuracy: ', evaluation_nn[1], 'on test data.')
print('CNN loss : ', evaluation_cnn[0], ' accuracy: ', evaluation_cnn[1], 'on test data.')

313/313 [==============================] - 1s 2ms/step - loss: 0.3344 - accuracy: 0.8767
313/313 [==============================] - 1s 2ms/step - loss: 0.2258 - accuracy: 0.9212
NN loss :  0.3343818485736847  accuracy:  0.8766999840736389 on test data.
CNN loss :  0.22580283880233765  accuracy:  0.9211999773979187 on test data.

Till now we have seen that CNN model converges faster than NN and achieve better training and validation accuracy for same number of epochs.

Inference

Now let's see how CNN and NN performs on test images and on slightly translated and bigger images. We will create a new image 38x38 and copy original image on top corner or bottom corner of new image and then resizing it to 28x28 for inference.

Let's define a common function for prediction, it takes model and image and return predicted class and probability for image.

def predict(model, img_nn):
  pred = model.predict(img_nn)
  cls_id = np.argmax(pred)
  conf = pred[0][cls_id]
  cls_name = classes[cls_id]
  return cls_name, conf

Below is the preprocessing function for original image. It return different dimensions image for NN and CNN, as NN needs 28x28 (or 28x28x1) and CNN needs 1x28x28x1 dimension images at inference time.

def preprocess_img(img):
  img_nn = np.expand_dims(img, axis=0) 
  img_cnn = np.expand_dims(img_nn, axis=3)
  return img_nn, img_cnn

Define a function for creating new image for NN and CNN, which will be the translated form of test images.

def translate_object(img):
  new_img = np.zeros((38,38))
  if(np.random.randint(2)):
    new_img[10:38,10:38] = img
  else:
    new_img[0:28,0:28] = img
  
  new_img = cv2.resize(new_img, (28,28))
  plt.imshow(new_img)
  plt.show()
  new_img =np.expand_dims(new_img, axis=0)
  img_nn = np.copy(new_img)
  img_cnn =np.expand_dims(new_img, axis=3)
  return img_nn, img_cnn

Finally let's process few images and it's translated version from the test dataset and see prediction from NN & CNN network.

for i in range(len(test_images)):
  
  img = test_images[i]
  img_nn, img_cnn = preprocess_img(img)
  plt.imshow(img)
  plt.show()
  print('GT-class: ', classes[test_labels[i]])
  print('NN on org Image ', predict(nn_model, img_nn))
  print('CNN on org Image ', predict(cnn_model, img_cnn))
  
  img_nn, img_cnn = translate_object(img)

  predict(nn_model, img_nn)
  predict(cnn_model, img_cnn)
  print('NN on transform Image ', predict(nn_model, img_nn))
  print('CNN on transform Image ', predict(cnn_model, img_cnn))

  print('\n**************************************************************\n')
  if(i == 10):
    break
  #break

Result

For above image we can see NN and CNN predicted correct class but when same image underwent some spatial translation neural network predict wrong class whereas CNN predict correctly, however we can see drop in confidence value for the class from 0.99 to 0.42.

Conclusion

Convolutional neural network converges faster than Neural network and achieve better accuracy on test dataset.
CNN perform better than NN on translated form of original image.

That's all for this blog. Code is available on Github for experiment.

Thanks!!