Neural Networks — the Rudiments and the Mathematics

--

Artificial Neural Networks( Source: VIASAT )

Artificial Neural Networks( ANN ) has gained immense traction in the recent years owing to its ability to emulate human brain and perform activities otherwise thought impossible. It is arguably the most intelligent piece of the whole game of Machine Learning as we know it today.

There are a number of frameworks and libraries like PyTorch and Tensorflow that have made building, training, and deploying Neural Networks really easy, fast and straight-forward. It is however beneficial to understand the underlying concepts to feel confident using these libraries and contribute in advancing the subject.

At the heart of it, neural networks are a collection of nodes working together in an inter-connected multi-layered structure. Each node loosely resembles a neuron in an animal brain which takes an input which can be image pixels, words, sentences, numbers, etc., processes it and forwards the processed state to another node for further processing and finally creating an output or an inference.

The whole life-cycle is broadly divided into training and testing phases. The ANN is trained using historical data and testing is done using a test set of data to validate the efficiency of the trained network. In training data, a set of characteristics, also called features, say Xi, are chosen on which you want the predictions to be based. The features are passed through a set of hidden layers which try its part to best understand and get closer to predicting the input correctly. The predicted result, also called a Label, say Y , is the guess that the neural networks have made to best identify your data.

Believe it or not, it is still unknown how these hidden layers actually work and the whole process is a trial-and-error till the best possible situation is achieved. Also, it is important to bear in mind that these hidden layers consume a lot of processing power and it is crucial to maintain a balance to how many layers you would want to add in the middle or how many times you want to iterate before you reach that sweet spot.

Activation Function

If Xi is the input for a hidden layer then Xi+1 can be the output from the layer. To every input or feature( Xi ), a specific weight( Wi ) and a bias (bi) are applied which outputs a value. Both the weight and the bias are chosen at random such that an input is better positioned to predict the output. An Activation function ( f ) then is applied which is essentially based on a threshold value.

One of the popular activation functions is Sigmoid function. This function takes its input and if the input is greater than a threshold it is forwarded to the next layer and if it’s less it is turned to zero.

Activation Function( F )

Forward Propagation

Every output of an activation function either triggers another neuron or gives out the final prediction depending on its position in the multi-layer. Every layer takes the input from the previous layer, processes it and the output is either discarded or passed on to the next layer. The output of the final layer ( Y ) is the considered the predicted value or Label for the features supplied.

Feature to Label Mapping

Back Propagation

After every iteration, the predicted value ( Y) is compared to the actual value, say Ya. The idea is to minimize the difference the predicted value and the actual value ( Ya ~ Y ). The weights and the bias values are tuned accordingly in every iteration to make the prediction closer to reality. This process is called Back Propagation. However, it is important to exercise caution to not get too close to real values using techniques of Generalization. For the sake of simplicity, we will keep it out of scope for this blog.

Top 4 Most Popular Ai Articles:

1. Deep Learning Book Notes, Chapter 1

2. Machines Demonstrate Self-Awareness

3. Visual Music & Machine Learning Workshop for Kids

4. Artificial Intelligence Conference

Cost Function

The predictions, after every iteration, are passed through a cost function to understand how different the prediction is from the actual. One of the popular cost functions is Mean Squared Error function. In MSE, the square of the difference between the Actual value and the Predicted Value is divided by the number of features provided to get the average.

Mean Squared Error

The intention is to minimize cost to an optimum level such that predictions can, more or less, fall in line with actual data.

Gradient Descent

The cost function output, also called cost, is mapped with the corresponding weights as below -

Gradient Descent

The weight that gives the least cost is the chosen weight to produce the most optimum prediction. Also, we would want to reach the lowest cost with minimum number of steps using most optimum learning rates (blue arrows).

The resultant network, after this whole process of training, takes in new data inputs to make predictions on them. The prediction efficiency is monitored using techniques like Accuracy, Precision and Recall ( will discuss these on later blogs). The input data are stored and used for training the ANN during the next training cycle for better efficiency and accuracy.

This is all there is to understand the basics of Artificial Neural Networks. I believe the above content has been useful to help you get interested in the field of Neural Networks and Machine Learning.

Don’t forget to give us your 👏 !

--

--