Microsoft Azure Machine Learning x Udacity — Lesson 6 Notes

Published in

Becoming Human: Artificial Intelligence Magazine

9 min readAug 11, 2020

Detailed Notes for Machine Learning Foundation Course by Microsoft Azure & Udacity, 2020 on Lesson 6 — Managed Services for Machine Learning

In this lesson, you will learn how to enhance your ML processes with managed services. We’ll discuss computing resources, the modeling process, automation via pipelines, and more.

Intro to Managed Services Approach

Conventional Machine Learning:

Lengthy installation and setup process: the setup process for most users usually involves installing several applications and libraries on the machine, configuring the environment settings, and then loading all the resources to even begin working within a notebook or integrated development environment, also known as IDEs.
Expertise to configure hardware: For more specialized cases like deep learning, for example. You also require expertise to configure hardware-related aspects such as GPUs.
A fair amount of troubleshooting: all this setup takes time and there’s sometimes a fair amount of troubleshooting involved in making sure you have the right combination of software versions that are compatible with one another.

Managed Services Approach:

Very little setup: it’s fully managed i.e, it provides a ready-made environment that is pre-optimized for your machine learning development.
Easy configuration for any needed hardware: Only a compute target needs to be specified which is a compute resource where experiments are run and service deployments are hosted. It offers support for datastore and datasets management, model registry, deployed service, endpoints management, etc.
Examples of Compute Resources: Training clusters, inferencing clusters, compute instances, attached compute, local compute.
Examples of Other Services: Notebooks gallery, Automated Machine Learning configurator, Pipeline designer, datasets and datastore managers, experiments manager, pipelines manager, model registry, endpoints manager.

Compute Resources

A compute target is a designated compute resource or environment where you run training scripts or host your service deployment. There are two different variations of compute targets: training compute targets and inferencing compute targets.

Training Compute:

Compute resources that can be used for model training.

For example:

Training Clusters: primary choice for model training and batch inferencing. They can also be used for general purposes such as running Machine Learning Python code. It also gives the option of a single or multi-node cluster. It is fully managed and can automatically scale each time a run is submitted and has automatic cluster management and job scheduling. It has support for both CPU and GPU resources to handle various types of workloads.
Compute Instances: primarily intended to be used as notebook environments but can also be used for model training.
Local Compute: compute resources of your own machine to train models

Inferencing Compute:

Once a model is trained, it is deployed for inferencing (or scoring). With batch inferencing, the inference is made on multiple rows of data named batches.

After the model is trained and ready to be put to work, it is deployed to a web hosting environment or an IoT device. When the model is used, it infers things about new data it is given, based on its training.

For example:

Inferencing Cluster: for real-time clustering. It inferences for each new row of data in real-time.
Batch Inferencing: to make inferences on multiple rows of data named batches.

Managed Notebook Environments

Notebooks are made up of one or more cells that allow for the execution of the code snippets or commands within those cells. They store commands and the results of running those commands. In this diagram, you can see that we can use a notebook environment to perform the five primary stages of model development:

Basic Modeling

Experiments:

An experiment is a general context for handling runs. Think about it as a folder that organizes the artifacts used in your Model Training process. Once you have an experiment, you can create runs within that experiment.

Runs:

A run is a single execution of a training script to train a model. It contains all artifacts associated with the training process, like output files, metrics, logs, and a snapshot of the directory that contains your scripts.

Run Configurations: a set of instructions that defines how a script should be run in a specified compute target. A run configuration can be persisted into a file inside the directory that contains the training script, or it can be constructed as an in-memory object and used to submit a run.

It includes a wide set of behavior definitions such as whether to use existing Python environments or to use a Conda environment that’s built from a specification. A run can have zero or more chart runs.

Models:

A run is used to produce a model. Essentially, a model is a piece of code that takes input and produces output. To get a model, we start with a more general algorithm. By combining this algorithm with the training data — as well as by tuning the hyperparameters — we produce a more specific function that is optimized for the particular task we need to do. Put concisely:

Model = algorithm + data + hyperparameters

Model Registry:

A registered model is a logical container for one or more files that make up the model. Once we have a trained model, we can turn to the model registry, which keeps track of all models in an Azure Machine Learning workspace. Note that models are either produced by a Run or originate from outside of Azure Machine Learning (and are made available via model registration).

Advanced Modeling

Machine Learning Pipelines:

As the process of building your models becomes more complex, it becomes more important to get a handle on the steps to prepare your data and train your models in an organized way. In these scenarios, there can be many steps involved in the end-to-end process, including:

Data ingestion
Data preparation
Model building & training
Model deployment.

These steps are organized into machine learning pipelines.

You use machine-learning pipelines to create and manage workflows that stitch together the machine-learning phases. There are cyclical and iterative in nature and facilitate continuous improvement of model performance, model deployment, and making inferences over the best performing model to date.

Machine-learning pipelines are made up of distinct steps:

Machine Learning Pipelines are modular which makes the steps usable and run without rerun in subsequent steps if the output of the steps has been changed.

MLOps: Creating Automatic End-to-End Integrated Processes:

Instead of manual processes, we want to develop processes that use automated builds and deployments. The general term for this approach is DevOps; when applied to machine learning, we refer to the automation of machine learning pipelines as MLOps.

Important aspects of MLOps:

Automating the end-to-end ML life cycle,
Monitoring ML solutions for both generic and ML specific operational issues
Capturing all data that is necessary for full traceability in the ML life cycle

Operationalizing Models

Operationalization refers to the deployment of a machine learning model after it has been trained and evaluated to the point where it is ready to be used outside of a development or test environment.

Typical Model Deployment:

Get the model file (any file format)
Create a scoring script (.py)
Optionally create a schema file describing the web service input (.json)
Create a real-time scoring web service
Call the web service from applications
Repeat the process each time the model is re-trained

Real-time Inferencing:

The model training process can be very compute-intensive, with training times that can potentially span across many hours, days, or even weeks. A trained model, on the other hand, is used to make decisions on new data quickly. In other words, it infers things about new data it is given based on its training. Making these decisions on new data on-demand is called Real-time Inferencing.

Batch Inferencing:

Unlike real-time inferencing, which makes predictions on data as it is received, Batch Inferencing is run on large quantities (batches) of existing data. Typically, batch inferencing is run on a recurring schedule against data stored in a database or other data store. The resulting predictions are then written to a data store for later use by applications, data scientists, developers, or end-users.

Batch scoring is used typically when historical time-based data is to be considered that is as long as possible and the value has to be predicted at a certain level of granularity. It typically involves latency requirements of hours or days so it doesn’t require using train models deployed to restful web services, as is done for real-time inference. Use Batch Inferencing when:

No need for real-time
Inferencing results can be persisted
Post-processing or analysis of the predictions is needed
Inferencing is complex

Due to the scheduled nature of batch inferencing, predictions are not usually available for new data. However, in many common real-world scenarios, predictions are needed on both newly arriving and existing historical data. This is where Lambda Architecture comes in.

The gist of the Lambda architecture is that ingested data is processed at two different speeds. A Hot path that tries to make predictions against the data in real-time, and a cold path that makes predictions in a batch fashion, which might take days to complete.

Programmatically Accessing Managed Services

Data scientists and AI developers use Azure Machine Learning SDK for Python to build and run machine learning workflows with the Azure Machine Learning service. You can interact with the service in any Python environment, including Jupyter Notebooks, Visual Studio Code, and your favorite Python IDE.

Key areas of the SDK include:

Manage datasets
Organize and monitor experiments
Model training
Automated Machine Learning
Model deployment

Azure Machine Learning service supports many of the popular open-source machine learning and deep learning Python packages that we discussed earlier in the course, such as:

Scikit-learn
Tensorflow
PyTorch
Keras

Lesson Summary

In this lesson, you’ve learned about managed services for Machine Learning and how these services are used to enhance Machine Learning processes.

First, you learned about various types of computing resources made available through managed services, including:

Training compute
Inferencing compute
Notebook environments

Next, you studied the main concepts involved in the modeling process, including:

Basic modeling
How parts of the modeling process interact when used together
More advanced aspects of the modeling process, like automation via pipelines and end-to-end integrated processes (also known as DevOps for Machine Learning or simply, MLOps)
How to move the results of your modeling work to production environments and make them operational

Finally, you were introduced to the world of programming the managed services via the Azure Machine Learning SDK for Python.

Don’t forget to give us your 👏 !