Transfer Learning — Part — 4.0!! VGG-16 and VGG-19

--

In Part 3 of the Transfer Learning series we have discussed the datasets on which these pre-trained model is trained for the ILVRC competition which is held annually and their repository as well as the documentation in order to implement this concept with two API’s namely Keras and PyTorch. In this, article we will discuss theoretically about the VGG-16 and VGG-19 and in article 4.2 and 4.3 we will have practical implementation with Keras and PyTorch API respectively. The link of notebook for setting up the along with the article is given below:

For the repository and document please follow below two mentioned links:

Keras:

PyTorch:

1. History of the VGG network

AlexNet came out in 2012 and it improved on the traditional Convolutional neural networks, hence we can understand VGG as a successor of the AlexNet but it was created by a group named as Visual Geometry Group at Oxford’s . It was invented by K. Simonyan and A. Zisserman from the University of Oxford in the paper “Very Deep Convolutional Networks for Large-Scale Image Recognition”. The model achieves 92.7% top-5 test accuracy in ImageNet, which is a datasets of over 14 million images belonging to 1000 classes.Hence the name VGG, It carries and uses some ideas from it’s predecessors and improves on them and uses deep Convolutional neural layers to improve accuracy and other metrics. There are many variant of the VGG architechture some of them are VGG-11, VGG-16 and VGG-19- which has 19.6 billions parameters and was trained om NVIDIA Titan Black GPU’s for weeks on ImageNet dataset.

2. Architecture of the VGG network(VGG-16 and VGG-19)

Here we will have a deep insight about the VGG architecture in depth along with their layer and activation. Below Image explains the variant of the VGG network.

Fig. 1. Variant of VGG Network

Column A : Contains 8 CNN layers so total of 11 layers including the fully connected(FC) layers and and has no difference internally except the number of layers.

Column A-LRN : This is also similar to the column A but has one extra step of Local response normalization(LRN) which implements lateral inhibition in the layer by which i mean that it makes a significant peak and thus creating a local maxima which increases the sensory perception which we may want in our CNN but it was seen that for this specific case that is ILSVRC it wasn’t increasing accuracy and the overall network was taking more time to train.

Column B : These columns just add extra CNN layers and are of 13 layers respectively.

Column C : This contains 13 CNN layers and 16 including the FC layers, In this architecture authors have used a conv filter of (1 * 1) just to introduce non-linearity and thus better discrimination.

Column D : These columns just add extra CNN layers and are of 16 layers respectively.

Column E: These columns just add extra CNN layers and are of 19 layers respectively.

Big Data Jobs

2.1 VGG-16

In this we will discuss the architecture of the VGG-16 network as it name suggests it is composed of 16 CNN layers and 3 Fully connected layers. The below diagram explains the architecture of the VGG-16 network.

Fig. 2 VGG-16 archtechture

As we can see the above diagram accurately depicts the VGG-16 architecture. This architecture is basically composed of 3 types of layers i.e. Convolution layer to extract the feature from the image by employing different number and types of filters, Max-pooling layer to decrease the image size and to extract the feature from the feature map obtained from these filters present in the Convolution layer , Flatten layer to turn the batches of feature maps into 1D tensor and finally 3 Fully-Connected where first two has a dense unit of 4096 layer final classification layer has 1000-way ILSVRC classification and thus contains 1000 channels (one for each class). The final layer is the soft-max layer. The configuration of the fully connected layers is the same in all networks. All the layers has Relu activation except for the final classification layer which has softmax activation for the predicting the probabilities of each class. It is also to be noted that none of the networks (except for one) contain Local Response Normalisation (LRN), such normalization does not improve the performance on the ILSVRC dataset, but leads to increased memory consumption and computation time.

The input to cov1 layer is of fixed size 224 x 224 RGB image. The image is passed through a stack of convolutional (conv.) layers, where the filters were used with a very small receptive field: 3×3 (which is the smallest size to capture the notion of left/right, up/down, center). In one of the configurations, it also utilizes 1×1 convolution filters, which can be seen as a linear transformation of the input channels (followed by non-linearity). The convolution stride is fixed to 1 pixel; the spatial padding of conv. layer input is such that the spatial resolution is preserved after convolution, i.e. the padding is 1-pixel for 3×3 conv. layers. Spatial pooling is carried out by five max-pooling layers, which follow some of the conv. layers (not all the conv. layers are followed by max-pooling). Max-pooling is performed over a 2×2 pixel window, with stride 2. The column E in figure 1. represents the VGG-19 architecture. The column C in figure 1. represents the VGG-19 architecture.

2.2 VGG-19

In this we will discuss the architecture of the VGG-19 network as it name suggests it is composed of 19 CNN layers and 3 Fully connected layers. The below diagram explains the architecture of the VGG-19 network.

Fig. 3 VGG-19

As we can see the above diagram accurately depicts the VGG-1 architecture. This architecture is basically composed of 3 types of layers i.e. Convolution layer to extract the feature from the image by employing different number and types of filters, Max-pooling layer to decrease the image size and to extract the feature from the feature map obtained from these filters present in the Convolution layer , Flatten layer to turn the batches of feature maps into 1D tensor and finally 3 Fully-Connected where first two has a dense unit of 4096 layer final classification layer has 1000-way ILSVRC classification and thus contains 1000 channels (one for each class). The final layer is the soft-max layer. The configuration of the fully connected layers is the same in all networks. For the final classification layer which has softmax activation for the predicting the probabilities of each class.

Trending AI Articles:

1. Why Corporate AI projects fail?

2. How AI Will Power the Next Wave of Healthcare Innovation?

3. Machine Learning by Using Regression Model

4. Top Data Science Platforms in 2021 Other than Kaggle

A fixed size of (224 * 224) RGB image was given as input to this network which means that the matrix was of shape (224,224,3). The only preprocessing that was done is that they subtracted the mean RGB value from each pixel, computed over the whole training set. The network uses kernels of (3 * 3) size with a stride size of 1 pixel, this enabled them to cover the whole notion of the image and spatial padding was used to preserve the spatial resolution of the image. Max pooling was performed over a 2 * 2 pixel windows with stride 2. It was followed by Rectified linear unit(ReLu) to introduce non-linearity to make the model classify better and to improve computational time as the previous models used tanh or sigmoid functions this proved much better than those.Three fully connected layers from which first two were of size 4096 and after that a layer with 1000 channels for 1000-way ILSVRC classification and the final layer is a softmax function. The column E in figure 1. represents the VGG-19 architecture.

3. Parameters in VGG network

In this section, we will discuss the parameter in VGG network. Since we have discussed VGG architecture in above section so, it will be helpful if we you read that section before reading it in order to have more clarity.

3.1 VGG-16

It constitute of 16 layers respectively . The VGG layer parameters can be seen in below image layer wise:

Fig. 3 VGG 16 architecture parameters.

3.2 VGG-19

It constitute of 19 layers respectively . The VGG layer parameters can be seen in below image layer wise:

Fig. 4 VGG 19 architecture parameters.

4. Usage of VGG Network

The main purpose for which the VGG net was designed was to win the ILSVRC but it has been used in many other ways.

  1. Used just as a good classification architecture for many other datasets and as the authors made the models available to the public they can be used as is or with modification for other similar tasks also.
  2. Transfer learning : can be used for facial recognition tasks as well.
  3. eights are easily available with other frameworks like keras so they can be tinkered with and used for as one wants.
  4. For creating Neural Art .
  5. Feature extraction.

In this article we have discussed about the VGG architecture theoretically in next article i.e. 4.2 and 4.3 we will have hands on experience with Keras and PyTorch API’s.

Stay Tuned !!! Happy Learning :)

Need help ??? Consult with me on DDI :)

Special Thanks:

As we say “Car is useless if it doesn’t have a good engine” similarly student is useless without proper guidance and motivation. I will like to thank my Guru as well as my Idol “Dr. P. Supraja” and “A. Helen Victoria”- guided me throughout the journey, from the bottom of my heart. As a Guru, she has lighted the best available path for me, motivated me whenever I encountered failure or roadblock- without her support and motivation this was an impossible task for me.

References

Pytorch: Link

Keras: Link

Tensorflow: Link

if you have any query feel free to contact me with any of the -below mentioned options:

YouTube : Link

Website: www.rstiwari.com

Medium: https://tiwari11-rst.medium.com

Github Pages: https://happyman11.github.io/

Articles: https://laptrinhx.com/author/ravi-shekhar-tiwari/

Google Form: https://forms.gle/mhDYQKQJKtAKP78V7

Don’t forget to give us your 👏 !

--

--