CNN Simplified for Beginners Part -II

--

Part-I of this blog series concluded by forming a compressed matrix for each of the features of the object to be identified. In essence, matrix so formed removed the bias stemming from magnification/reduction, rotation, and distortion of the image. Job is just half done, our algorithm still didn’t meet its core objective of classifying the objects in a given image. In this blog, we will progress to the subsequent step to build a neural network model that will learn by combining the various inputs features and successfully identify the objects in an image, with higher accuracy (Refer Pic -1)

CNN applied on our base Image (Pic-1)

Our compressed matrix from the previous step (Pic-2)

Compressed Feature matrix (Pic -2)

Each number in the above matrix (i.e) 0.33,1.0,55 etc is a neuron and forms the first layer of our neural network. The output layer will be the various objects that the model is been trained to learn and classify.

Top 4 Most Popular Ai Articles:

1. AI for CFD: Intro (part 1)

2. Using Artificial Intelligence to detect COVID-19

3. Real vs Fake Tweet Detection using a BERT Transformer Model in few lines of code

4. Machine Learning System Design

For time-being, the intermediate hidden layers can be regarded as a Blackbox which will cruise us towards our goal(Pic-3)

Building a neural network (Pic-3)

Do we need a black box? can we directly code a function that will map a given combination of input numbers to a specific output?. That's a typical programmers mindset, but it doesn't work in the world of Machine learning. To clarify, let's consider the below image of a set of animal eyes(Pic-4), can you identify the animals based on it?

Eyes of Animals (Pic-4)

Few of us may do it with some degree of difficulty but for a higher-speed and greater accuracy, our brain doesn’t depend solely on one feature rather it combines a set of features to form a visual map. Neural network exactly replicate this model, it associates various features and classifies an image, this technique not only guarantees higher speed but also an improved precision

Our task is now simplified in unraveling the mystery around the so-called “Black Box”, technically this box is formed by layers of neurons (hence the name Neural Network) each layer connected to the previous one by a set of weights, a sample neural network (Pic-5)

Sample Neural Network (Pic-5)

Each of the connecting arrows has a “weight” associated with it. To start with we assign a random number as weight and we calculate the weight of a neuron in the hidden layer as the sum of all its incoming weights (Forward propagation), refer pic-6

Weight Calculation (Pic-6)

In pic-6, we had oversimplified neuron “y” having only one input so its weight is the product of W and X0, assuming we had two inputs X0 and X1 with weights W1 and W2, then we can compute the value of Y as the summation of products X0, W1, and X1, W2. Extending this technique we can keep propagating forward forming more hidden layers by combing various neurons from the previous layer. If we take a step back and observe this is essentially a permutation of all possible input features leading to an output image. Tensorflow, an open-source API from Google offers a no strings attached playground wherein you can try various combinations (Increase/Reduction of Input features, Hidden layers, weights, addition of noise, etc) of this Black box.

ML Jobs

As we set up hidden layers and assign the weights, the model attempts to classify an input image, the first few iterations will be futile resulting in an inaccurate classification, this is part of the learning curve. Weights are continuously adjusted and tuned until the model comes closer in making better predictions. Though theoretically, it sounds easier, mathematically this step takes significant effort in the whole program.

Let's get one level deeper to understand it, following our previous example of elephant image as input, if the model classifies the output as Zebra with 80% confidence and the Elephant with 50% confidence, then we certainly know it's incorrect and weights need to be tuned. But which direction do we tune the weight? should we increase or decrease it? and how many weights should I keep adjusting? Let's resolve one at a time. To answer the first query of which direction should we change the weight? we embrace a concept called Gradient Descent. (Pic-7)

Gradient Descent (Pic-7)

In the above picture, point X2 is the only location wherein the slope of the curve is zero, in simple terms, at X2 we get the most optimal result in all other points either we have a positive or negative slope. So determining X2 is the goal of gradient descent. Continuing with our previous example, so if we change weight W1 confidence limit of elephant increases from 50 to an integer greater than 50(i.e >50) than we are traveling in the right direction. Basically, we then adopt a “Trial and Error” method to reach the point of zero gradient, when we confidently know the direction but we will have no way of exactly knowing the location of X2. We keep iterating, if we take huge steps we run a risk of missing zero gradients and jumping to a positive gradient zone on the other hand if our steps are too smaller then we are risking a lot of computation intense resources (budget overshoot)and also slowing down of the whole program. Technically this is called “Learning Rate” and is a hyper-parameter.

For a CNN beginner, this will be a good start in gradient descent, as we embark on larger ML programs, the complexity increases as a curve may have more than one gradient(mini-batch, stochastic, etc) refer pic-8, which further convolutes the whole exercise and we adopt other complimenting techniques(more of it later)

Mini Batch Gradient Descent (Pic-8)

Resorting to our simple elephant model, by tuning weights, by varying the number of hidden layers (a good recommendation is to keep hidden layers less than 10 as it will create over-fitting, a phenomenon where the model aligns so closely to the Training dataset and accuracy plummets with Test or the real-world dataset), the best way is to have more distinct input features and in-parallel tune the other hyperparameters until model classifies image with a higher accuracy (> 95% confidence). This in nutshell explains the broader contours of the CNN algorithm. As I mentioned earlier, no developer codes CNN from scratch rather they utilize the implementation of it by various open-source libraries (YOLO, Tensorflow, etc)

Before we conclude, let's also know CNN is not an elixir for all the classification problems. CNN magic works with images that adhere to a spatial pattern, in simple words image that follows a predefined structure. For example, the US map always has Florida on the east coast and the image of the face has eyes above the lips. For items where spatial pattern limitation doesn't apply, CNN is not our “go-to” algorithm.

US map follows a spatial pattern

What we have seen in this series is only at a superficial level on the nuances of CNN algorithm, to fathom the complexity and depth of it we need to work on few real-time examples ..more about it in the final/concluding part of this blog series

For a quicker grasp on CNN, Pls follow the below video

Don’t forget to give us your 👏 !

--

--