Loss Functions in Neural Networks

--

Loss functions show how deviated the prediction is with actual prediction. Machines learn to change/decrease loss function by moving close to the ground truth. There are many functions out there to find the loss based on the predicted and actual value depending on the problem. And optimizers are used to minimize the loss to make predictions better.Let’s discuss about few Loss functions.

Optimizing Loss

Cross Entropy Loss :
This is the most common loss function used for classification type problems. It works in such a way that loss decreases as the predicted probability converges towards the ground truth.

Here’s the Mathematical Formula for Cross Entropy Loss

Cross Entropy

y(i) is the original/ actual value and y(i)cap is the predicted output.This is for binary classification, so it is known as binary cross entropy. As it is a binary classifier the output predicted is either 0 or 1. Where 0 refers to one class and 1 referring another. For example if the actual value is 0 then the first term in Cross entropy loss becomes zero and we are multiplying 1 with log of predicted value. And similarly if actual value is 1 second term disappears to get the value of loss. The main aspect of this loss function is that it penalizes the predictions which are confident but wrong.

Big Data Jobs

If we have more than 2 classes, Then we use categorical cross entropy loss function to calculate our loss and it is calculated for each and every class predicted and actual value for every sample. For your reference check the formula added below

Categorical Cross Entropy

Trending AI Articles:

1. Why Corporate AI projects fail?

2. How AI Will Power the Next Wave of Healthcare Innovation?

3. Machine Learning by Using Regression Model

4.Top Data Science Platforms in 2021 Other than Kaggle

Hinge Loss / Multi Class SVM loss :

As we know that score of the correct class must be greater than the combined score of all other classes with some margin. Hinge loss also has the same property where it considers the marginal value of 1. Hence this loss function is used for maximum-margin classification mostly for support vector machines.

Mathematical Formula:

Hinge Loss

Let’s look into an example for better understanding…..

Example for Hinge Loss
Explanation for Loss based on above Scores

So, If it returns 0 then that prediction is fine, And any other value other than zero denotes how far it predicted from the original output. But it is not differentiable, but it makes easy to work with convex optimizers in machine learning.

Now Lets see some loss functions for Regression type problems.

Mean Square Error (MSE):

Mean square error as its name suggests, it is the average of squared difference between predicted values and the actual values. Here as it is considering squares of the values the output is only concerned with magnitude but not the direction.So the predictions which are far away from the actual values are penalized much in comparison with the less deviated values. And as it has a easier mathematical property of averaging, It becomes easy to calculate gradients in back propagation. (We’ll look into these terms in our coming articles, Follow me ans stay tuned..).

Mathematical Formula for Mean Squared Error

Mean Square Error

Mean Absolute Error (MAE):

Mean absolute error is calculated in a way similar to MSE, The only difference is that it considers the absolute values of the differences between actual and predicted values before averaging. So instead of squaring done in MSE for making all the values positive here the absolute value is known. It is more robust towards outliers as it will not make use of square.

Mathematical Formula for Mean Absolute Error

Mean Absolute Error

So, These are the mostly used Loss function in order to optimize the machine for making prediction better.

That’s it for this article, Thank You for reading…..
Do follow me and stay tuned for more interesting Stories :)

Don’t forget to give us your 👏 !

--

--