Microsoft Azure Machine Learning x Udacity — Lesson 4 Notes

Published in

Becoming Human: Artificial Intelligence Magazine

9 min readAug 4, 2020

Detailed Notes for Machine Learning Foundation Course by Microsoft Azure & Udacity, 2020 on Lesson 4 — Supervised & Unsupervised Learning

This lesson covers two of Machine Learning’s fundamental approaches: supervised and unsupervised learning. You will learn about classification, regression, clustering, representation learning, and more.

Supervised Learning: Classification

In a classification problem, the outputs are categorical or discrete.

Some of the most common types of classification problems include:

Classification on tabular data: The data is available in the form of rows and columns, potentially originating from a wide variety of data sources.
Classification on image or sound data: The training data consists of images or sounds whose categories are already known.
Classification on text data: The training data consists of texts whose categories are already known.

Examples of Classification problems are:

Computer Vision
Speech Recognition
Biometric Identification
Document Classification
Sentiment Analysis
Credit Scoring
Anomaly Detection

Categories of Algorithms:

At a high level, there are mainly 3 categories of the algorithms:

Two-Class (Binary) Classification: used when the prediction has to be made only between two categories, e.g. True/False, Yes/No.
Multi-Class SIngle-Label Classification: used when there are multiple categories to predict from, however, the output belongs to a single category, e.g. red, yellow, green, or blue.
Multi-Class Multi-Label Classification: used when there are multiple categories to predict from and the output can belong to multiple categories, e.g. red, yellow, green, or blue.

Two-Class Classification Algorithms:

Multi-Class Classification Algorithms:

Multi-Class Algorithms

Multi-Class Algorithms Hyperparameters:

Multi-Class Logistic Regression: It is a well-known method in statistics that is used to predict the probability of an outcome and is popular in classification tasks. The two key parameters to configure this algorithm are 1) Optimization Tolerance: controls when to stop the iterations, if the improvement between iterations is less than the specified threshold, the algorithm stops and returns the current model, and 2) Regularization Weight: regularization is a method of preventing overfitting by penalizing models with extreme coefficient values. The Regularization Weight control how much to penalize the models at each iteration.
Multi-Class Neural Network: A typical example includes the input layer, hidden layer, and output layer. The relationship between the input and the output is learned from training the Neural Network on input data. The 3 key parameters to configure the Multi-Class Neural Network include: 1) Number of Hidden Nodes: this option allows customizing the number of nodes in the neural network, 2) Learning Rate: controls the size of the step taken at each iteration before the correction, 3) Num of Learning Iterations: maximum number the algorithm should process the training cases.
Multi-Class Decision Forest: an ensemble of Decision Trees. The algorithm works by building multiple Decision Trees and then voting on the most popular output class. The 5 key parameters to configure the Multi-Class Decision Forest include: 1) Resampling Methods: controls the method used to create the individual Decision Trees, 2) Number of Decision Trees: specifies the maximum number of Decision Trees that can be created in the ensemble, 3) Maximum Depth of Decision Trees: number to limit the depth of any Decision Tree, 4) Number of Random Splits per Node: the number of splits to use when building each node of the tree, 5) Minimum Number of samples per Leaf Node: controls the minimum number of cases required to create any terminal node in a tree.

Supervised Learning: Regression

In a regression problem, the output is numerical or continuous.

Introduction to Regression

Common types of regression problems include:

Regression on tabular data: The data is available in the form of rows and columns, potentially originating from a wide variety of data sources.
Regression on image or sound data: Training data consists of images/sounds whose numerical scores are already known. Several steps need to be performed during the preparation phase to transform images/sounds into numerical vectors accepted by the algorithms.
Regression on text data: Training data consists of texts whose numerical scores are already known. Several steps need to be performed during the preparation phase to transform text into numerical vectors accepted by the algorithms.

Examples of Regression Problems:

Housing prices
Customer churn
Customer Lifetime Value
Forecasting (time series)
Anomaly detection

Categories of Algorithms

Common machine learning algorithms for regression problems include:

Linear Regression

A linear relationship between one or more independent variables and a numeric outcome (dependent variable)
Fast training
Two popular approaches to measuring error and fit the regression line:

Ordinary Least Square Method: computes error as the sum of the squares of distance from the actual value to the predicted line and fits the model by minimizing the squared error. This method assumes a strong linear relationship between the independent and the dependent variables.
Gradient Descent: minimize the amount of error at each step of the model training process.

Decision Forest Regression

An ensemble learning method using multiple decision tress
Each tree outputs a distribution as a prediction
Aggregation is performed to find a distribution closest to the combined distribution
Accurate, fast training times
It supports some of the same hyperparameters as the Multi-Class Decision Forest Algorithm i.e. Number of Trees, Max Depth, etc.

Neural Net Regression

Label column must be a numerical data type
A fully connected Neural Network: Input layer + one Hidden layer + Output layer
Accurate, long training times
It supports the hyperparameters as the Multi-Class Neural Network Algorithm, i.e. Number of Hidden Nodes, Learning Rate, Number of Iterations, etc.

Automate the Training of Regressors

Automated Machine Learning enables the automated exploration of the combinations needed to successfully produce a trained model. AutoML intelligently tests multiple combinations of algorithms and hyperparameters in parallel and returns the best one. It enables building Machine Learning models with high-scale efficiency and productivity, all while sustaining model quality. The resulting models can be:

Deployed into production, or
Further refined and customized

Beyond the primary metric, you can also review a comprehensive set of performance metrics and charts to further assess the model performance.

Unsupervised Learning

In unsupervised learning, algorithms learn from unlabeled data by looking for hidden structures in the data.

Obtaining unlabeled data is comparatively inexpensive and unsupervised learning can be used to uncover very useful information in such data.

Types of Unsupervised Machine Learning

Clustering: organizes entities from the input data into a finite number of subsets or clusters

Feature Learning: transforms sets of inputs into other inputs that are potentially more useful in solving a given problem

Anomaly Detection: identifies two major groups of entities: 1) Normal, 2) Abnormal (anomalies)

Some other types include Dimensionality Reduction, Feature Extraction, Neural Networks, Principle Component Analysis, Matrix Factorization.

Semi-Supervised Learning

Semi-supervised learning combines the supervised and unsupervised approaches; typically it involves having small amounts of labeled data and large amounts of unlabeled data.

The problem:

Difficult and expensive to acquire labeled data
Acquiring unlabeled data which is usually inexpensive

The solution:

Uses a small amount of labeled data and a much larger amount of unlabeled data

Self-Training: train the model using labeled data and use it to make predictions on the unlabeled data. The output is a dataset that is fully labeled and can be used in a Supervised Learning approach.
Multi-view Training: train multiple models on different views of data that includes various feature selection, parts of training data, or various model architectures.
Self-ensemble Training: similar to Multi-view Training except a single model is trained on different views of data

Clustering

Clustering is the problem of organizing entities from the input data into a finite number of subsets or clusters; the goal is to maximize both intra-cluster similarity and inter-cluster differences.

Applications of Clustering Algorithms:

Personalization and target marketing
Document classification
Fraud Detection
Medical imaging
City Planning

Clustering Algorithms:

Centroid-Based Clustering: organizes data into clusters based on the distance of members from the centroid of the cluster, e.g. K-Means.
Density-based Clustering: clusters members that are closely packed together and it can learn clusters of arbitrary shapes.
Distribution-based Clustering: The underlying assumption is that the data has an inherent distribution type such as normal distribution. The algorithm clusters based on the probability of a member belonging to a particular distribution.
Hierarchical Clustering: builds a tree of clusters. This is best-suited for hierarchical data such as taxonomies.

K-Means Clustering:

K-means is a centroid-based unsupervised clustering algorithm.

It creates up to a target (K) number of clusters and group similar members together in a cluster. The objective is to minimize intra-cluster distances (squared error of the distance between the members of the cluster and its center).

K-Means Clustering Algorithm:

Steps:

Initializes Centroid locations.
Assign each member to a cluster represented by the closest centroid.
Compute the new cluster centroids based on current cluster membership.
Check for Convergence.

Different types of Convergence criteria. 1) check how much the centroid location change as a result of new cluster membership. If the total change in centroid location is less than a given tolerance, it will assume convergence and stop. 2) based on a fixed number of iterations, If the convergence criterion is not met, it will iterate starting with step number two.

K-Means Module Configurations:

Number of Centroids: number of clusters you want the algorithm to begin with. The algorithm starts with this number of data points and iterates to find the optimal configuration.
Initialization approach: the selection of the initial centroids. The options for initialization are first n random or k-means++ algorithm.
Distance metric: default for this is the Euclidean distance
Normalize features: uses the Min-Max Normalizer to scale the numeric data point from zero to one
Assign label mode: used only if your dataset already has a label column. uses the min-max normalizer to scale the numeric data point from zero to one. Optionally, the label values can be used to guide the selection of the clusters. Another use of the label column is to fill in missing values.
Number of iterations: dictates the number of times the algorithm should iterate over the training data before it finalizes the selection of centroids

Lesson Summary

This lesson covered two of Machine Learning’s fundamental approaches: supervised and unsupervised learning.

First, we learned about supervised learning. Specifically, we learned:

More about classification and regression, two of the most representative supervised learning tasks
Some of the major algorithms involved in supervised learning, as well as how to evaluate and compare their performance
How to use automated machine learning to automate the training and selection of classifiers and regressors

Next, the lesson focused on unsupervised learning, including:

Its most representative learning task, clustering
How unsupervised learning can address challenges like lack of labeled data, the curse of dimensionality, overfitting, feature engineering, and outliers
An introduction to representation learning

Don’t forget to give us your 👏 !