In this blog I will explain how to do image classification in python using Keras. Image classification is a basic problem in Deep Learning. It is a method to classify images into their respective classes.
![]() |
image_1.jpg |
![]() |
image_2.jpg |
Here is an example of Image classification, image_1.jpg should be predicted as cat and image_2.jpg should be predicted as dog by CNN (Convolutional Neural Network) model.
Keras gives us lots of CNN (VGG, ResNet, Inception etc.) models already trained on ImageNet Dataset for 1000 classes (http://www.image-net.org/), we will be using VGG-16 model and fine-tune it on dog-vs-cat dataset (https://www.kaggle.com/c/dogs-vs-cats/data) for two classes.
Data Preparation
How to prepare Train, Validation and Test dataset ?
Train dataset is the only dataset that is used for model training (learning weights and biases) and validation dataset is used to monitor the loss and accuracy of intermediate model on unseen data. Train and validation dataset is used during training only.
After training we can choose top three model which is performing good on validation data, and test these models on the test dataset to select the best model.
We will use 80% (8000) of the total data (10000) for training purpose and 10%-10% (1000-1000) for validation and testing. It’s not a rule to use 10% of the whole dataset for testing and validation, sometimes we also use 1-2% of total images when we have millions of images.
We need three directory train, validation & test. Each directory should contain one sub-directory for each class filled with respective images.
-- train/
-- cat/
--dog/
--validation/
--cat/
--dog/
--test/
--cat/
--dog/
Code and Explanation
In this section i will explain code and important points regarding classification model.
First let's import all the required packages and modules.
As VGG-16 model is trained on 224x224x3 image size so we will be using same size for fine tuning however we can also fine-tune it for different size like 100x100x3 or 448x448x3, we will see how to fine-tune model for different image size in my future blogs.
As we want to modify the VGG-16 model for our own dataset we will load only convolution blocks not fully connected and output layers. This line will load the VGG-16 model with ImageNet weights without fully connected and output layers as we set “include_top” argument as false.
Original VGG-16 have two fully connected layers with 4096 neurons and output layer for 1000 classes.
As cat and dog images are not very difficult to differentiate so we will be using only 256 neurons in fully connected (FC) layer instead of 4096 neurons as in original VGG-16 network. Final prediction layer will have only one neuron for binary classification.
You can see summary of our network using model.summary() function.
Freeze Layer
Freezing any layer means we won't be modifying the weights and biases of that layer during training. This is one of the important things to do while fine tuning any already pre-trained model.
Why to freeze layers ?
As we know each layer in any network learns to detect some kind of feature like different kind of edges, corners, patches, color and template like feature from images so during fine tuning pre-trained model we can use some of the features (like edge and corner features) as it is useful for all problem statement. That's why we can freeze some layers to use already learned features.
If we freeze some layers of network that means we are modifying less layers during training process, so training and convergence process of model should be fast.
When to freeze the layers of the base model ??
One can freeze some layers during fine-tuning of pre-trained model in case of
- If you have less training images
- If new classes are similar to old classes on which model is already trained, and if new classes are totally different then one would have to train more layers of the base network like in case of fine-tuning VGG model on medical images.
Currently we are training only 30% layers of base model, If we have more images we can train more layers.
Note: Layer freezing of model should be done before model compilation.
Now let's compile the model with SGD optimizer and binary crossentropy loss. For more than two classes use categorical_crossentropy.
One can use Adam optimizer or other optimizer because some time some optimizer perform well on some data and the same optimizer with same parameters doesn’t perform well on different types of data. So it’s always better to experiment with two-three optimizers and select the one which performs well on your data.
Data Loading
Let's see how to load the data and perform augmentation using ImageDataGenerator class of Keras.
At training and validation step we want to normalize the data and at training time we also want to augment the images to increase the training dataset size. Here we are using shear, zoom and horizontal flip techniques to augment more training images.
flow_from_directory is the function that load the images from given directory, we have to pass the path of the directory containing sub-directory for each classes. Here we are using 'binary' class_mode because we have only two classes, for more than two classes use 'categorical' class_mode.
Callback
A callback is a set of functions to be applied at given stages of the training procedure. With the help of callback functions we can monitor the loss, save weights files and plot training & validation loss graphs during training.
To make things simple we will be using only ModelCheckpoint callback for now. With this callback we can save weights when their is increment in validation accuracy or decrement in validation loss controlled by monitor and mode parameter.
Enabling save_best_only parameter will save weight file if new weights are better than the previous saved weight, disabling it will save weight file of all epochs.
save_weights_only is the important parameter, if it’s true then it will save only weights of the network not the network architecture. To use this weights you have to first create the network then you can load the weights with load_weights function.
If save_weights_only is false then while saving weights it will save network architecture, training configuration and state of optimizer which allow you to resume the training where you left off. You don’t need network architecture information, you can directly load weights and network by using load_model function.
By default save_weights_only is False.
Training
Now we will use fit_generator function to start the training.
There are few terms that one should know.
Epoch - While training any deep learning model, the model is training on the whole dataset many times not just once. Epoch defined as one pass over the entire dataset. Model trained for one epoch means models have seen the whole training images once.
Batch Size - A set of N images. The samples in a batch are processed independently, in parallel. During training weights are going to be updated after each batch. Batch size can be 32, 64, 128 or 256 , depending on GPU and memory size of your system use appropriate batch size. Batch size also affects the convergence of the model I will not go in detail of this.
Steps per epoch - steps_per_epoch * batch_size = total training samples
So steps per epoch should be equal to total_training_sample divided by batch_size if you set less than the above calculated number then you will not use all training images during one epoch.
Let's train this network for 25 epoch.
Here is training summary till 8th epoch. Model achieved 98% accuracy in 6th epoch itself.
Inference
Let's see how to do inference on test images.
Import libraries to load model and to read images.
Here we have defined the classes and loaded the model using load_model function.

Let's see how to do inference on test images.
Import libraries to load model and to read images.

In the above code we have loaded the image using PIL image processing library (I have used PIL library because Keras data loader internally uses PIL to read the images at training time), and resized the image to (224,224,3) as our model is trained for this size and then we have added one more dimension in image to make image dimension from (224,224,3) to (1,224,224,3). Model accept four dimension input, added dimension is to represent batch_size. And finally we have normalized the image by dividing it by 255.
Predict function on image for binary classification give us the probability between 0 & 1. Put a threshold on probability if less then threshold then class 0 else class 1 and here we have used 0.5 as threshold value. If model is biased toward one class then you might want to change the threshold value.
So in this blog we have seen how to fine tune VGG-16 model for new dataset & achieve good accuracy and how to use new trained model for inference on images.
That's all for this blog, hope you find this blog informative.
Thanks for reading !!
Code, used dataset and model link are below.
Full code github link - https://github.com/111surajmaurya/VGG-16
Dataset google drive link - https://drive.google.com/drive/folders/15Et639VosTWxEmPLRqx4hdTdGvzQupbg?usp=sharing
No comments:
Post a Comment