Linear Regression from Scratch: Machine Learning Approach

The “Hello world!” of machine learning from scratch.

--

Hi! Ardi here. In my previous post I was writing about the statistical approach to solve linear regression problem, which is basically only using several formulas to create a best-fit straight line to estimate the value of dependent variable y based on given training data x. Click the link below if you wanna read the article.

Today, in this post I wanna do the similar thing, yet this one is going to be done using machine learning approach. In statistics, we do not use optimization algorithm to solve this task. On the other hand, such algorithm is required in the field of machine learning. Here I decided to use gradient descent optimization algorithm (which is the simplest one) to minimize the value of MSE (Mean Squared Error). Furthermore, the dataset I use in this project is exactly the same as what I use in the previous post, which can be downloaded from here.

Machine Learning Jobs

Note: I put the full code at the end of this post.

Now let’s start to load the required modules first:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

As you can see here we do not import Sklearn module since we are going to do all the calculation from scratch.

Regression line

Regression line. Source: https://en.wikipedia.org/wiki/Linear_regression.

Before we go deeper into the machine learning, it’s important to know that the linear regression line is basically just a linear function — hence the name is linear regression. The equation can be denoted like this:

Linear regression equation.

Here x is used to represent all samples in the dataset. Notice that here I use y_hat (instead of just y) since the line basically represents value predictions, not the actual target value. The main objective of doing linear regression is to figure out the value of m and b, which represent slope and y-intercept respectively. In statistical approach, we can directly apply a formula to compute those unknown values. However, here in machine learning, we are going to start by assigning random value for both variables and then we try to predict the best value for m and b with the help of error/loss function and optimization algorithm. Basically the idea here is to use optimizer to minimize the error value gradually.

Trending AI Articles:

1. Microsoft Azure Machine Learning x Udacity — Lesson 4 Notes

2. Fundamentals of AI, ML and Deep Learning for Product Managers

3. Roadmap to Data Science

4. Work on Artificial Intelligence Projects

Now let’s load our dataset and check how the distribution looks like.

df = pd.read_csv('student_scores.csv')
df.head()
How our dataset looks like.

So our dataset here consists of 2 columns, namely Hours and Scores. Those columns essentially show the students’ studying duration and the score they obtain in final exam respectively. The goal of this task is to predict students’ final exam score based on their studying duration. Hence, we can say that values in Hours column is going to be our independent variable x, while numbers in Scores column will be our dependent variable y. To make things look more straightforward, here I assign the values in both columns to array x and y.

x = df['Hours'].values
y = df['Scores'].values

Now we can use plt.scatter() to see how the distribution looks like.

plt.figure(figsize=(8,6))
plt.title('Data distribution')
plt.scatter(x, y, s=30)
plt.xlabel('hours')
plt.ylabel('score')
plt.show()
How the data distribution looks like.

Loss function: MSE (Mean Squared Error)

Before doing the error minimization process, we have to know first how our error function looks like. Here I decided to use what’s called as MSE.

Mean Squared Error loss function.

The function above is pretty simple though. Here y, y_hat and n represent actual y, predicted y and the number of samples in our dataset. Also, it’s important to remember that i denotes i-th sample. Next, since the prediction y_hat is essentially obtained using our regression line, then we can substitute this variable with a linear function.

Plugging y_hat equation to MSE.

Still, our problem here is that we do not have the optimal value for m and b just yet such that the error value is minimized. So in the next step we are going to use gradient descent algorithm to gradually update this m and b values.

Gradient descent algorithm

There are several steps that we need to do to run this algorithm:

First: initialize value for m and b. I mentioned earlier that the value of these 2 variables should be random numbers. However, to make things simpler, I decided to assign 0 to both variables as the initial value.

m = 0
b = 0

If we try to print our line with m=0 and b=0, then we are going to see an output like this:

x_line = np.linspace(0,10,100)
y_line = m*x_line + b
plt.figure(figsize=(8,6))
plt.title('Data distribution')
plt.scatter(x, y, s=10)
plt.plot(x_line,y_line, c='r')

plt.xlabel('hours')
plt.ylabel('score')
plt.show()
The regression line at initial m and b.

Second: define learning rate L and number of epochs. In simple words, learning rate defines how fast our gradient descent algorithm reduces error value for each epoch (iteration). Generally, the value of learning rate is a very small number. Here I decided to set the the value to 0.001. It’s important to keep in mind that small L value slows down the training process (we might need to increase the number of epochs), yet on the other hand, large learning rate value may cause our gradient descent algorithm fail to reach its minimum error.

L = 0.001
epochs = 100

Third: calculate the partial derivative of our loss function with the respect to m and b. Here I store those derivatives to dm and db.

Derivative of MSE with the respect of m.
Derivative of MSE with the respect of b.

Fourth: update the value of m and b by taking into account the value of both derivatives and learning rate. Note that the third and fourth step are going to be done iteratively.

Updating the value of m and b.

Implementation

As we already got the idea of how gradient descent algorithm works, now we can start to implement this in the code. All the code below are based on the mathematical notations we defined earlier.

# The number of samples in the dataset
n = np.float(x.shape[0])
# An empty list to store the error in each epoch
losses = []
for i in range(epochs):
yhat = m*x + b

# Keeping track of the error decrease
mse = (1/n) * np.sum((y - yhat)**2)
losses.append(mse)

# Derivatives
dm = (-2/n) * np.sum(x * (y - yhat))
db = (-2/n) * np.sum(y - yhat)
# Values update
m = m - L*dm
b = b - L*db

After the training process is done, we can try to print out the new value of m and b. We can see here that now those values have been updated thanks to the gradient descent algorithm.

New values of m and b after 100 epochs.

If we display the regression line with these updated values, we should get the following output:

x_line = np.linspace(0,10,100)
y_line = m*x_line + b
plt.figure(figsize=(8,6))
plt.title('Data distribution')
plt.plot(x_line, y_line, c='r')
plt.scatter(x, y, s=10)
plt.xlabel('hours')
plt.ylabel('score')
plt.show()
Regression line after being trained.

We can see here that our algorithm works well as it’s now able to create a line which approximates all data points in our dataset. In other words, we can also say that this regression line produces much smaller error compared to our initial line when m = b = 0. Here’s the code if you want to see how the error value decreases as the training process goes.

plt.title('Loss values')
plt.plot(losses)
plt.ylabel('loss')
plt.xlabel('epoch')
print('Initial loss\t:', losses[0])
print('Final loss\t:', losses[-1])
Loss value decrease.

That’s all of this article! Please let me know if you spot any mistakes. Thanks for reading!

Note: here’s the code.

References

Linear Regression using Gradient Descent by Adarsh Menon. https://towardsdatascience.com/linear-regression-using-gradient-descent-97a6c8700931

Don’t forget to give us your 👏 !

--

--

A machine learning, deep learning, computer vision, and NLP enthusiast. Doctoral student of Computer Science, Universitas Gadjah Mada, Indonesia.