Final Steps in the ML Life Cycle: From Validation to Deployment

Published in

Becoming Human: Artificial Intelligence Magazine

5 min readMay 6, 2021

Today, we’re going to dive into the final steps of our machine learning life cycle. And this is where we face the reality check: How good is our current model, does it already add value to our client’s problem and is it ready to be deployed to production?

In our previous articles we covered the process of data collection & data preparation, model evaluation and model training. Now, we address the procedure of validating our model performance, getting feedback from our client and deploy the model into productive use.

Model Validation

There is one central question: What benefits does our current model already offer to our client? We have to validate the model performance at this stage and find out where we stand now.

A brief update on the actual task to be solved: Our client is a dance federation and our task is to build an AI model capable of classifying images as aesthetic or unaesthetic. The dance federation wants to automatically identify aesthetic images to use them for marketing.

The first step of model validation is ensuring that the validation and test accuracy are close to each other. Otherwise, we know that the model cannot generalize to unseen data.

Next, we have to do some further analysis to get a feeling for the weak points in the behavior of the model. This is important because the training data is only an approximation of the problem to be solved. In production, the model has to cope with very different data, and we want to find out how the model would perform on that.

Confusion Matrix

We can use a confusion matrix to gain more detailed insights into the performance of our model. A typical confusion matrix looks like this:

Let’s say we have a multiclass classification problem with a, b, c and d as possible classes. The confusion matrix tells us basically two things:

We see the accuracy per class. This is the diagonal from upper-left to bottom-right in the matrix. So our model perfectly predicts classes c and d, but has some problems with a and b.
If the model makes false predictions, we can see whether there is a bias towards certain classes. Class b inputs were wrongly predicted as class a, reducing the accuracy of class b to 0.67.

A confusion matrix is great for multiclass prediction problems to analyze the bias towards certain classes. It’s like a signpost pointing towards how the dataset can be improved to fix the misbehavior in the next trainings. In our case with the dance images we have to solve a binary classification problem (images are either aesthetic or not).

Comparison and Feedback

In this step we usually compare the current iteration and its progress with the previous deployed model (unless this is the first iteration).

We sum up our experiences (accuracy, F1 score and the overall behavior of the manual inspection) and develop countermeasures for wrong predictions: What misconduct can be attributed to which aspects of the data? How can we change this, e.g. by adding new images, removing old ones, augmenting images, or varying the colours of the images randomly.

Then we present a non-technical overall status to our client, address the problems of the model and offer countermeasures. This is often one of the most difficult tasks in machine learning projects as we have to translate technical insights into non-technical advices how the model could be further improved. Eventually, we agree on the next steps and go into the next iteration to implement the changes as discussed.

Deployment

If the model already adds value for the client, it can be integrated into as prototype or even in production. Generally, the model should be deployed as soon as possible, since we can then get valuable insights and feedback how the model actually performs with real data. The deployed model also is the performance base line for the following training iteration.

Once we deployed the first model we have gone through the machine learning life cycle, from conception to deployment. More iterations follow for various reasons, e.g. if users get unexpected results or there is a model shift (new classes should be added and others are not used any more).

Machine learning is a continuous cycle where the progress occurs in iterations.

We hope our series about the machine learning life cycle is helpful for you. Contact us if you need some help starting your own AI project.