Spam Mail Detection Using Support Vector Machine.
In this blog, we are going to classify emails into Spam and Anti Spam. Here I have used SVM Machine Learning Model for that.
All the source code and dataset are present in my GitHub repository. Links are available in the bottom of this blog.
So let's understand the dataset first.
Here in the dataset, you can see there are two features.
- Label — Ham or Spam
- Email Text — Actual Email
So basically our model will recognize the pattern and will predict whether the mail is spam or genuine.
Algorithm used — SVM
About SVM
“Support Vector Machine” (SVM) is a supervised machine learning algorithm which can be used for both classification or regression challenges. However, it is mostly used in classification problems. In the SVM algorithm, we plot each data item as a point in n-dimensional space (where n is a number of features you have) with the value of each feature being the value of a particular coordinate. Then, we perform classification by finding the hyper-plane that differentiates the two classes very well.
So, let’s jump on our coding section
Import Important Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import GridSearchCV
from sklearn import svm
Load our Dataset
data = pd.read_csv(‘spam.csv’)
Checking the information of the dataset
data.info()
Trending AI Articles:
1. Machine Learning Concepts Every Data Scientist Should Know
2. AI for CFD: byteLAKE’s approach (part3)
3. AI Fail: To Popularize and Scale Chatbots, We Need Better Data
Splitting our data into X and y.
X = data[‘EmailText’].values
y = data[‘Label’].values
Splitting our data into training and testing.
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=0)
Converting text into integer using CountVectorizer()
# Converting String to Integer
cv = CountVectorizer()
X_train = cv.fit_transform(X_train)
X_test = cv.transform(X_test)
Applying SVM algorithm
from sklearn.svm import SVC
classifier = SVC(kernel = ‘rbf’, random_state = 0)
classifier.fit(X_train, y_train)
Accuracy
print(classifier.score(X_test,y_test))
>> 0.9766816143497757
Here we are getting around 97.66% which is a great approach. I also request to clone my repository from here and work further with this dataset and can comment me their accuracy with different classification models.
I hope you like this blog. Feel free to share your thoughts in the comment section and you can also connect with me in:-
Linkedin — https://www.linkedin.com/in/shreyak007/
Github — https://github.com/Shreyakkk
Twitter — https://twitter.com/Shreyakkkk
Instagram — https://www.instagram.com/shreyakkk/
Facebook — https://www.facebook.com/007shreyak
Thank You for reading.