What is Dropout Layer ?
Dropout layer randomly sets input units to 0 with a frequency of drop_rate at each iteration during training time, which help prevent overfitting.
The key idea of dropout is randomly drop nodes along with their connection from the neural network during training time.
Dropout layer takes single float values as input between 0 to 1. In Keras implementation it denotes drop probability of unit. We will call it p_drop, so keep probability of unit is p_keep = 1 - p_drop.
![]() |
Why do we need Dropout ?
To solve the problem of Overfitting.
Overfitting means our model is performing well on training data but not performing well on test data (or new data).
One of the reason for overfitting is because our model is quite complex (having large number of parameter), so instead of just learning (generalizing) patterns/features in the data it also learn the noise present in the data and so it adjust it's weight to perform well on training data or we can also say that it adjust it's weight to memories the training data. And other reason of overfitting is training data is not good representation of overall (real) data.
If training dataset is good representation of real data but not in good amount, this can also cause overfitting.
How dropout is solving the problem of Overfitting.
Multiple way to look into this
1. One way to look into this is, it reduces model complexity by randomly setting layer units to zero and so reducing model complexity that help in solving the overfitting.
2. During each training step, it drops unit with p_drop probability from the layer and then train a thinned network.
Because at each training step, it trains a unique thinned network with less neurons, so the neuron present in network learn the representation(features) required for correct prediction. This prevent neurons from co-adapting too much on each other.
This make the network capable of better generalization and hence solving overfitting.
3. ""Overfitting can also be solved by training all possible neural network for a dataset and average the prediction form all model. But this is not possible.""
Let's see how we can interpret the above concept with dropout layer.
During each training step we sample one out of 2^n network and train, so during whole training process we train multiple thinned network.
So training a neural network with dropout can be seen as training a collection of 2^n thinned network with extensive weight sharing, where each thinned network get trained very rarely, if at all.
At test time, we can not take average of the prediction from all those networks. However simple approximate average method work well. So during inference time, idea is to use full network with all units with scaled-down version of weights.
Dropout during training and inference time
Lets say we want to apply dropout on this input data d = {1,2,3,4,5} with p_drop = 0.2 so now during training any one unit of d will become zero and d could be {1, 2, 3, 0, 5} because p_drop is 0.2 another way to look into this is we keep each node with probability (p_keep) 0.8 .
During inference time we will be using all the unit as dropout don't remove units during inference time. If we use all unit during inference, expected output will be different than training time. To make sure that the distribution of the values after the transformation during inference time remains almost the same, we multiply input with keep probability p_keep(1-p_drop) at inference time, during inference same d would be set to {0.8, 1.6, 2.4, 3.2, 4.0}.
But in general we don't want to do anything with dropout layer during inference time so during training time only, we scale the values by 1/p_keep.
So now during training d could be set to {1.25, 2.5, 3.75, 0, 6.25} and nothing will happen with the input d during inference time.
That is why if you see Keras documentation of dropout it will say, dropout first set units to 0 with given drop probability(p) and then scale the remaining values by 1/(1-p).
That's all about dropout, thanks for reading the blog!
No comments:
Post a Comment