Spam Mail Detection Using Support Vector Machine.

--

In this blog, we are going to classify emails into Spam and Anti Spam. Here I have used SVM Machine Learning Model for that.

All the source code and dataset are present in my GitHub repository. Links are available in the bottom of this blog.

So let's understand the dataset first.

Here in the dataset, you can see there are two features.

  1. Label — Ham or Spam
  2. Email Text — Actual Email

So basically our model will recognize the pattern and will predict whether the mail is spam or genuine.

Algorithm used — SVM

About SVM

“Support Vector Machine” (SVM) is a supervised machine learning algorithm which can be used for both classification or regression challenges. However, it is mostly used in classification problems. In the SVM algorithm, we plot each data item as a point in n-dimensional space (where n is a number of features you have) with the value of each feature being the value of a particular coordinate. Then, we perform classification by finding the hyper-plane that differentiates the two classes very well.

AI Jobs

So, let’s jump on our coding section

Import Important Libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import GridSearchCV
from sklearn import svm

Load our Dataset

data = pd.read_csv(‘spam.csv’)

Checking the information of the dataset

data.info()

Trending AI Articles:

1. Machine Learning Concepts Every Data Scientist Should Know

2. AI for CFD: byteLAKE’s approach (part3)

3. AI Fail: To Popularize and Scale Chatbots, We Need Better Data

4. Top 5 Jupyter Widgets to boost your productivity!

Splitting our data into X and y.

X = data[‘EmailText’].values
y = data[‘Label’].values

Splitting our data into training and testing.

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=0)

Converting text into integer using CountVectorizer()

# Converting String to Integer
cv = CountVectorizer()
X_train = cv.fit_transform(X_train)
X_test = cv.transform(X_test)

Applying SVM algorithm

from sklearn.svm import SVC
classifier = SVC(kernel = ‘rbf’, random_state = 0)
classifier.fit(X_train, y_train)

Accuracy

print(classifier.score(X_test,y_test))
>> 0.9766816143497757

Here we are getting around 97.66% which is a great approach. I also request to clone my repository from here and work further with this dataset and can comment me their accuracy with different classification models.

I hope you like this blog. Feel free to share your thoughts in the comment section and you can also connect with me in:-
Linkedin — https://www.linkedin.com/in/shreyak007/
Github — https://github.com/Shreyakkk
Twitter — https://twitter.com/Shreyakkkk
Instagram — https://www.instagram.com/shreyakkk/
Facebook — https://www.facebook.com/007shreyak
Thank You for reading.

Don’t forget to give us your 👏 !

--

--

Technology Enthusiastic Guy. I post blogs related to Data Science, Machine Learning, Python, Flutter and much more.