How to evaluate the Machine Learning models? — Part 5

Published in

Becoming Human: Artificial Intelligence Magazine

4 min readDec 27, 2020

This is the fifth aka last part of the metric series where, we will be discussing metrics which are used most mostly in ranking. There are several other metrics which plays a very crucial role in evaluating the model, feel free to reference that also because of time constrained I covered the metrics which are are used widely. This will be a short article where we will be understanding Gini Index, MRR (Mean Reciprocal Rank).

Gini IndexNotebook for Reference:

Gini Coefficient is an indicator of how well the model outperforms random predictions or mean prediction. As we have calculated the AUC-ROC in Part-3 metric evaluation. Gini Index is derived from the AUC-ROC curve.It can be also defined how the model exceeded random predictions in term of the ROC.

2. MRR (Mean Reciprocal Rank)

Mean Reciprocal Rank is a measure to evaluate systems that return a ranked list of answers to queries. This is the simplest metric of the three. It tries to measure “Where is the first relevant item?”. It is closely linked to the binary relevance family of metrics.For a single query, the reciprocal rank where rank iis the position of the highest-ranked answer If no correct answer was returned in the query, then the reciprocal rank is 0.

Trending AI Articles:

1. 130 Machine Learning Projects Solved and Explained
2. The New Intelligent Sales Stack
3. Time Series and How to Detect Anomalies in Them — Part I
4. Beginners Guide -CNN Image Classifier | Part 1

This method is simple to compute and is easy to interpret.This method puts a high focus on the first relevant element of the list. It is best suited for targeted searches such as users asking for the “best item for me”.Good for known-item search such as navigational queries or looking for a fact.The MRR metric does not evaluate the rest of the list of recommended items. It focuses on a single item from the list.It gives a list with a single relevant item, just a much weight as a list with many relevant items. It is fine if that is the target of the evaluation.This might not be a good evaluation metric for users that want a list of related items to browse. The goal of the users might be to compare multiple related items.

3. Cohen Kappa

Kappa is similar to Accuracy score, but it takes into account the accuracy that would have happened anyway through random predictions.It is also defined as how model exceeded random predictions in terms of accuracy.

This is the end of the metric series, thought I have not touched the metrics related to the reinforcement learning — we will discuss this topic in a later section when we will discuss reinforcement learning.

Special Thanks:

As we say “Car is useless if it doesn’t have a good engine” similarly student is useless without proper guidance and motivation. I will like to thank my Guru as well as my Idol “Dr. P. Supraja”- guided me throughout the journey, from the bottom of my heart. As a Guru, she has lighted the best available path for me, motivated me whenever I encountered failure or roadblock- without her support and motivation this was an impossible task for me.

Contact me:

If you have any query feel free to contact me on any of the below-mentioned options:
Website: www.rstiwari.com
Medium: https://tiwari11-rst.medium.com
Google Form: https://forms.gle/mhDYQKQJKtAKP78V7

References:

YouTube : https://www.youtube.com/channel/UCFG5x-VHtutn3zQzWBkXyFQ

Don’t forget to give us your 👏 !