Interpretable Machine Learning: An Overview

--

Despite their high predictive performance, many machine learning techniques remain black boxes because it is difficult to understand the role of each feature and how it combines with others to produce a prediction. However, users need to understand and trust the decisions made by machine learning models, especially in sensitive fields such as medicine. For this reason, there is an increasing need of methods able to explain the individual predictions of a model, that is, a way to understand what features made the model give its prediction for a specific instance.

A typical machine learning setting is shown in the following picture:

Source: https://www.darpa.mil/program/explainable-artificial-intelligence

We have a neural network (the machine learning model) trained as an image classifier. This model would give a probability (let’s say 0.98) that a cat appears in the picture (an observation), so we could say that “our model predicts that this is a cat with a probability of 0.98”. But we wouldn’t really know about the reasons behind this prediction. Wouldn’t it be better if we could say that “our model predicts that this is a cat with a probability of 0.98 because it has fur, whiskers, claws and ears with a certain shape”? With that information, we would understand why our neural network (correctly) predicted that there is a cat and, also, we could decide whether to trust or not the prediction. This is the goal of interpretable machine learning (or explainable machine learning).

Trending AI Articles:

1. How To Choose Between Angular And React For Your Next Project

2. Six AI Subscriptions to keep you Informed

3. Data Augmentation using Fastai

4. Mask R-CNN explained

A popular approach to this explanation problem is to use an interpretable (or transparent) model, from which an explanation can be extracted by observing its components. For instance, a decision tree can be explained by observing the rules that lead from the root to the leaves, or a generalized linear model can be explained by each feature’s estimated coefficient. Interpretable models are a valid solution (and will provide meaningful insights) as long as they are accurate for the task, but limiting ourselves to this kind of models is too restrictive and it can compromise accuracy, so it is preferrable to be able to use models as flexible as needed by the problem, without restrictions.

For less transparent models (i.e, black boxes), an alternative approach is to use model-specific (or model-dependent) explanations, that is, explanation methods designed specifically for a certain type of model. For example, there are methods for explaining the decisions of random forests, and other methods for artificial neural networks. The problem of these methods is that the explanation method has to be replaced every time that the model is replaced with one of a different type and, consequently, the final user needs additional time and effort to get used to the new explanation method.

Therefore, we need model-independent (or model-agnostic) explanations. That is, a method that can explain the predictions of any model. With such explanations, we are not restricted to a specific type of model. Also, since such method uses the same techniques and representations to explain predictions of any model, it is easier to compare two candidate models for the task or to switch models if they are from different types.

A model-independent method just requires a trained model and one observation to be explained. Since it can’t make any assumption about the model, the explanation has to be generated just using predictions made by the model. As a result, it will highlight the most influential features on the model’s prediction. And it will typically show them with visual artifacts to facilitate interpretability. For example, an explanation could be a set of weights (one for each feature) that are positive if the feature value supports the prediction and negative if not:

Example of explanation

The three main model-independent methods are:

  • Local Interpretable Model-agnostic Explanations (LIME). This method generates an explanation for a prediction from the components of an interpretable model (for instance, the coefficients in a linear regression) which approximates the black-box model locally around the point of interest and which is trained over a new data representation to ensure interpretability.
  • Explanation vectors. Intuitively, a feature has a lot of influence on the model’s decision if small variations in its value cause large variations of the model’s output, while a feature has little influence on the prediction if big changes in that variable barely affect the model’s output. This method defines explanations as gradient vectors at the point of interest, which characterize how a data point has to be moved to change its prediction.
  • Interactions-based Method for Explanation (IME). Based on cooperative game theory, this method considers features to be players of a game where the worth of the coalitions is the change in the model’s prediction with respect to the model’s expected output produced by knowing the feature values of the observation being explained. IME divides the total change in prediction among the features in a way that is “fair” to their contributions across all possible subsets of features.

In future posts I will explain in detail each one of these methods along with examples of their application.

--

--

Data scientist with 3+ years of experience in web analytics as a consultant. MSc graduate (Statistics & O.R.). I love turning data to actionable insights.