Machine Learning Series Day 5 (Support Vector Machines)

I promise it’s not just another “ML Article.”

--

Terminology:

  1. Vectors: A vector is composed of a magnitude and direction. Geometrically, a vector in a 2-Dimensional plane (x and y graph) is a line from the origin to its coordinates. For example, if we have the coordinates (3,4), we can sketch a line from the origin to (3,4) which is three on the x-axis, and four on the y-axis.
  2. Magnitude: To calculate for magnitude, we have to find the length between the origin (0,0) and (3,4). Using the Pythagorean Theorem, we compute the square root of ³²+⁴² which is 5!
  3. Direction: For direction, we use the trigonometric functions: sin, cos, and tan. We will use the tan formula. Theta, the angle we are computing for, is Tan(Theta) = 4/3. To find Theta, we inverse the equation: Theta = Inverse Tan(4/3) which equals to about a 53-degree angle.
  4. Hyperplane: A subspace whose dimension is one less than that of its ambient space.
  5. Decision Boundary: The hyperplane that is mathematically computed to split the dependent variable.
  6. Support Vectors: Data values that near or on the Decision Boundary (a hyperplane).
  7. Dot Product: The dot product, also called the scalar product, of two vectors, is a number (a scalar quantity) calculated by multiplying two vectors.
  8. Unit Vector: A vector of length 1, sometimes also called a Direction Vector.

Trending AI Articles:

1. From Perceptron to Deep Neural Nets

2. Keras Cheat Sheet: Neural Networks in Python

3. Neural networks for solving differential equations

4. Turn your Raspberry Pi into homemade Google Home

Concept

The objective of a Support Vector Machine (SVM) is to find a hyperplane that best separates two possible independent categorical variables — classification problems. If you are tasked to discover whether a person will get a loan or not, this model could do the job.

Intuitively, this model tries to find a boundary that can “best” split the data values based on the potential target value (e.g., approved or not approved).

What do you mean by best?

It’s a bit difficult to explain without a visualization (which we’ll do later on the article) but let me use a simple example. You’re going to play dodgeball with your friends, and there are two teams. As a Data Scientist, I would like to make the game as fair as possible. Thus, I draw a horizontal line that splits both teams by following the requirements below:

  • No player can be on the other team side.
  • The distance best separates both teams.

Details:

Supervised/Unsupervised: Supervised

Regression/Classification: Classification

Visual:

Geometrically, the concept is simple. Let’s look at the image below. The potential target variable is color coordinated. For example, if it’s blue, the person likes movies, or if it’s red, the person does not like movies. The idea is to find support vectors near the black line. The black line represents the boundary.

Realize that there are many ways to split the data in the image below. However, the algorithm chose the black line because it computes to the largest margin between both classes.

  • The yellow line is the decision boundary. It’s the width of the margin divided by two.

Mathematics/Statistics:

If you’re asking how does the algorithm maximize the distance the classes, then you’re thinking in the correct lines (no pun intended).

Before I try to provide a good explanation of how the model finds the Support Vectors, we need to get an overview of Linear Algebra.

1.Vector arithmetic: Subtraction.

We need to calculate the difference between the vectors on each of the black line. For example, in the image above, we have the data points on the black line (the red and blue dot). You should familiarize yourself with viewing (x, y) as a vector rather than coordinates. It will help you better understand computation in higher-dimensions.

Calculate the differences between both vectors. Luckily, this calculation is probably a derivation you have been doing since middle school.

  • u = [3, 4]
  • v = [7, 2]
  • u minus v= [-4, 2]
  • v minus u= [4, -2]

Note that the ordering of the subtraction changes the geometric representation, but if you were able to pick up the calculated vector (either u-v or v-u), you could drop between vector_u and vector_v.

Use the image above as a visual interpretation of vector arithmetic.

2. Why do we care?

Fair point. Let’s look at the image with the blue and red data points. Let’s draw a line between data points along the support vector — green line.

However, this is not the difference between the support vectors (data points on the black lines). It’s only the difference between the blue and red dot. However, we need to compute the black lines.

Which is why we need to find an orthogonal (perpendicular) line on the blue or red dot relative to the black line. To figure this out, we need to compute the dot product.

3. Vector arithmetic: Dot Product.

Another useful arithmetic calculation is the dot product. The dot product multiplies each instance in one vector by the same instance in the second vector. It then sums the multiplication of each instance. For example:

  • Vector_a = [1, 4]
  • Vector_b = [9, 23]
  • Result = (1*9) + (4*23) = 105
  • Geometrically, 105 is the distance projection of Vector_a onto Vector_b or vice versa.

The dot product allows us to project one vector onto another. For example, in the image below, we can project the green line onto the light purple line. Since the purple line perpendicular to the black line, when we compute the dot product, we are calculating the distance between the black lines!

4. Maximizing the distance between the Support Vectors (points on the black lines).

That requires a lot of math! I suggest you watch the Professor’s Wilson YouTube tutorial!

Looks confusing?

Well, the first part of the equation represents what we are trying to maximize the distance between both support vectors.

The second part of the equation represents what we are trying to minimize how wrong our predictions are. The SVM minimizes the training error it an exciting manner. There are only two possible outcomes: 1 or -1. For example, one could mean that the image is a dog while -1 could signify that the image is a cat. If your model predicts that the image is a dog is 0.7, then the loss function is 0.3 (1–0.7). However, if your model computes a probability of 1.2 that the image is a dog, then the loss function is 0.

The second part of the equation is seeking support vectors that will compute confident and accurate predictions.

5. Issues With Assumptions

One drawback with SVMs is that it assumes that a hyperplane can separate the data points. However, what if your graph looks like the left image below. We won’t else be able to have data labels that could distinguishably be separated by a line. Hence, we could transform our data into a higher dimension that will have a hyperplane that separates both classes. SVMs can do with a kernel. Think of a kernel as an equation that is applied to all the data points.

Once we transform the data into a higher dimension, we can change such a way to have a hyper-dimension separator. We then project this back to its original dimension. Similar to the image below.

Final Thoughts:

A Support Vector Machine is a great model that has an interesting mathematical background. Let me know the type of problems that SVMs have helped you!

WANT MORE…

If so, I suggest following my Instagram page. I post summaries and thoughts on a book that I have and am currently reading.

Instagram: Booktheories, Personal

Follow me on: Twitter, GitHub, and LinkedIn

AND if you liked this article, I’ll appreciate it if you click on the like button below. THANKS!

Don’t forget to give us your 👏 !

--

--