Training and Prediction with Google Cloud Platform services — Quick overview

The blog I would like to have read before trying Cloud ML Engine

--

My first machine learning exercises were done with Scikit tool which is very simple to train models and getting results only running a local python script. Then I decided to try Cloud Machine Learning Engine and it took me some time to understand how its processes flow and its artefacts.

First of all, we’ll be using Google Cloud Platform services as Cloud Machine Learning Engine to training and prediction, and Cloud Storage to store in the cloud our datasets and other files generated by training and prediction processes. The operations like training, storing or requesting a prediction could be executed by the gcloud commands which are provided by Cloud SDK, or using a rest API. Both these services require authentication in GCP.

The training process starts with the trainer model creation that should include the training program. At this point, we are introduced to TensorFlow, which is a python library for numerical computation and provides a graphic UI, the TensorBoard. This tool helps us to check the training results and some model metrics values, as accuracy or precision, which I will mention below.

We can train our data locally or in the cloud (requesting ML Engine jobs) as our dataset could be at the local environment or stored in a bucket at the Cloud Storage. The training generates a set of files and values that could be friendly displayed by TensorBoard, as I said before.

The model binaries are part of theses generated files in training process and they are used in the prediction process. In the prediction request we should mention the target JSON file path which could also be stored locally or at Cloud Storage. After that, we finally, got the prediction results.

Metrics: Accuracy and Precision

Training is an iterative process and a way to improve our models is understanding the metrics every iteration give us. If we think about the purpose of our predictions results, the goal is to get much more correct predictions than incorrect. But what does mean correct and incorrect predictions?

To simplify the exercise, let’s think about a binary classification problem where the prediction results only could be Class 1 or Class 0 for each sample. A correct prediction means the model predicts Class 1 and the sample is actually Class 1 (called as True Positive) or the model predicts Class 0 and we have Class 0 in fact (called as True Negative). We also could have two kinds of incorrect prediction results: when model predicts Class 1 but it’s actually a Classe 0 (called as False Positive) and, finally, when Class 0 is predicted but the real result should be Class 1 (called as False Negative). Confused?

Confusion Matrix

Analysing these four possible results and results quantities of each one, we can find the Accuracy and the Precision of our model for example (but there are many other metrics).

Accuracy of the model is about correct predictions and it represents the ratio between the correct predictions and all prediction. A perfect scenario would have accuracy 1 which means zero wrong predictions.

Accuracy

On the other hand, precision is about positive correct predictions. It represents the ratio between true positives and the sum of true and false positives.

Precision

Each problem has its own specification so you should assign different goals for these metrics values in different situations.

Here are some lectures that help me to understand those concepts:

I hope this was useful, share and clap if you liked :)

--

--