Customer-Churn Prediction Using Machine Learning

Uqba Ahmad

Published in

Becoming Human: Artificial Intelligence Magazine

7 min readOct 14, 2022

Predicting the Telecom Customer Churn.

Update: This article is part of a series. Check out the full series: Part 1

In this blog, let’s try to figure out how to do Machine Learning with alogrithms. ML tools allow you to predict and prevent churn before it happens. Churn prediction is used to identify areas for improvement and to keep customers happy.

DATA SET
The data set used in this article is available in the Kaggle(WA_Fn-UseC_-Telco-Customer-Churn.csv). The raw data contains 7043 rows (customers) and 21 columns (features). Each row represents a customer, each column contains customer’s attributes described on the column Metadata.

Environment and tools

VS Code
Jupyter Notebook

I followed the general machine learning workflow step-by-step:

Feature engineering and selection.
Compare several machine learning models on a performance metric.
Perform hyper-parameter tuning on the best model.
Evaluate the best model on the testing set.
Interpret the model results.
Result.

Here, I followed the classification alogrithm in machine learning :

Logistic Regression
Decision Tree
Random Forest
K-Nearest Neighbors
AdaBoost Classifier
Gradient Boost Classifier
Extra Tree Classifier

I started by importing all the necessary libraries:

import pandas as pd
import numpy as np
import seaborn as snsfrom sklearn.preprocessing import OneHotEncoder, OrdinalEncoder
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifierfrom sklearn.ensemble import AdaBoostClassifier
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.ensemble import  RandomForestClassifier
from sklearn.ensemble import GradientBoostingClassifierfrom sklearn.pipeline import make_pipelinefrom sklearn.metrics import accuracy_score
from sklearn.metrics import mean_absolute_error, mean_squared_error
from sklearn.metrics import classification_report, confusion_matrix

Use the code the pd.set_option("display.max_coulmns", None) to display all the columns. Then I loaded the csv file containing 7043 rows (customers) and 21 columns (features).

df = pd.read_csv("Customer-Churn.csv")

Note: Exploratory Data Analysis includes data visualisation. So, before learning about Machine Learning, I recommend that you read my post on Customer-Churn Analysis. Click here to read my Medium post on Customer-Churn-Analysis.

Let’s Start with Feature Engineering

Feature Engineering and Selection

Feature engineering is a machine learning technique that uses data to generate new variables that were not present in the training set. It has the potential to generate new features for both supervised and unsupervised learning, with the goal of simplifying and speeding up data transformations while also improving model accuracy.

The main feature engineering techniques that will be discussed are:

1. Missing data imputation

2. Categorical encoding

3. Variable transformation

4. Outlier engineering

5. Date and time engineering

Advantages of feature engineering

Improves Accuracy: With less misleading data, modelling accuracy improves.
Reduces Overfitting: Less unnesccesary data means fewer opportunities to make noise-based decisions.
Reduces Training Time: Fewer data points reduce algorithm problem, allowing algorithms to train more quickly.

No modification

The SeniorCitizen column is already binary and should not be changed.

Ordinal Encoding

Ordinal encoding converts each label into integer values and the encoded data represents the sequence of labels.

cols = ['gender','Partner','Dependents','PhoneService', 
        'MultipleLines','InternetService','OnlineSecurity',
        'OnlineBackup','DeviceProtection','TechSupport',
        'StreamingTV','StreamingMovies','Contract','TotalCharges',
        'PaperlessBilling','PaymentMethod']ord = OrdinalEncoder()
ord.fit(df[cols])
df[cols] = ord.transform(df[cols])df.head()

One-Hot Encoding:

One hot encoding is one method of converting data to prepare it for an algorithm and get a better prediction.

Churn_ohe = OneHotEncoder(drop='first', sparse=False, dtype=np.int32)
Churn_dummies = Churn_ohe.fit_transform(df[['Churn']])
df.drop(columns=['Churn'],inplace=True)df = pd.concat([df, pd.DataFrame(Churn_dummies)], axis=1)

Splitting the data in training and testing sets

Train/Test is a method to measure the accuracy of your model. Train the model means create the model. You test the model using the testing set. 70% for training, and 30% for testing.

First, we create a variable X to store the dataset’s independent attributes. In addition, we define a variable y to hold only the target variable.

X = df.drop(columns = ['Churn'])
y = df['Churn'].values

Then, from the sklearn.model_selection package, we can use the train_test_split function to generate both the training and testing sets.

X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                   test_size=  0.30, random_state=1)print('X_train:',len(X_train))
print('X_test:',len(X_test))
print('y_train',len(y_train))
print('y_test',len(y_test))

Output: X_train : 4930, X_test : 2113, y_train: 4930, y_test: 2113

Compare several machine learning models on a performance metric

I used classification algorithm to predict our model:

Logistic Regression:

model = make_pipeline(StandardScaler(),LogisticRegression())
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
accuracy = model.score(X_test,y_test)
print("Logistic Regression accuracy is :",accuracy)

cf_matrix = confusion_matrix(y_test, y_pred)
sns.heatmap(cf_matrix , annot=True,fmt = "d", cmap='OrRd')
plt.title("LOGISTIC REGRESSION CONFUSION MATRIX");

print("classification_report")
print(classification_report(y_test, y_pred))

Decision Tree:

DecisionTreeClassifier

model = make_pipeline(StandardScalar(),DecisionTreeClassifier())
model.fit(X_train,y_train)
y_pred = model.predict(X_test)
accuracy = model.score(X_test,y_test)
print("Decision Tree accuracy is :",accuracy)

cf_matrix = confusion_matrix(y_test, y_pred)
sns.heatmap(cf_matrix , annot=True,fmt = "d", cmap='OrRd')
plt.title("DECISION TREE CONFUSION MATRIX");

print(classification_report(y_test, y_pred))

Decision tree gives very low score.

Random Forest:

Random Forest Classifier

model = make_pipeline(StandardScalar(), RandomForestClassifier())
model.fit(X_train,y_train)
y_pred = model.predict(X_test)
accuracy = model.score(X_test,y_test)
print("Random forest accuracy :",accuracy)

cf_matrix = confusion_matrix(y_test, y_pred)
sns.heatmap(cf_matrix , annot=True,fmt = "d", cmap='OrRd')
plt.title("RANDOM FOREST CONFUSION MATRIX");

print(classification_report(y_test, prediction_test))

K-Nearest Neighbors:

model = make_pipeline(StandardScaler(),KNeighborsClassifier())
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
accuracy = model.score(X_test, y_test)
print("K-Nearest Neighbors: ", accuracy)

cf_matrix = confusion_matrix(y_test, y_pred)
sns.heatmap(cf_matrix , annot=True,fmt = "d", cmap='OrRd')
plt.title("K-NEAREST NEIGHBORS CLASSIFIER CONFUSION MATRIX");

print("classification_report")
print(classification_report(y_test, y_pred))

AdaBoost Classifier:

model = make_pipeline(StandardScaler(), AdaBoostClassifier())
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
accuracy = model.score(X_test, y_test)
print("AdaBoost Classifier accuracy :",accuracy)

cf_matrix = confusion_matrix(y_test, y_pred)
sns.heatmap(cf_matrix , annot=True,fmt = "d", cmap='OrRd')
plt.title("ADABOOST CLASSIFIER CONFUSION CONFUSION MATRIX");

print("classification_report")
print(classification_report(y_test, y_pred))

AdaBoost Classifier accuracy is quite good.

Gradient Boosting Classifier:

model = GradientBoostingClassifier()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print("Gradient Boosting Classifier", accuracy_score(y_test, y_pred))

cf_matrix = confusion_matrix(y_test, y_pred)
sns.heatmap(cf_matrix , annot=True,fmt = "d", cmap='OrRd')
plt.title("GRADIENT BOOSTING CLASSIFIER CONFUSION MATRIX")

print("classification_report")
print(classification_report(y_test, y_pred))

As shown above, we obtain a sensitivity of 0.89(1426/(1426+159)) and a specificity of 0.64(287/(287+241)). The model obtained predicts more accurately customers that do not churn. because gradient boosting classifiers tend to favour classes with more observations.

Extra Tree Classifier:

model = make_pipeline(StandardScaler(), ExtraTreesClassifier())
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print("Extra Trees Classifier Score :", accuracy_score(y_test, y_pred))

cf_matrix = confusion_matrix(y_test, y_pred)
sns.heatmap(cf_matrix , annot=True,fmt = "d", cmap='OrRd')
plt.title("EXTRA TREE CLASSIFIER CONFUSION MATRIX");

print(classification_report(y_test, y_pred))

Result: We tried 7 different machine learning algorithms using default parameters.Finally, we tuned the Gradient Boosting Classifier (best performance model) for model optimization, obtaining an accuracy of nearly 80%. So, at the end of this project, we have a classification model that can correctly predict 77.84% of churning clients.

A confusion matrix would be useful for determining the churners.
Experiment with more complex Machine Learning algorthms, such as Xgboost, and fine-tuning the hyper parameters gives more results.

In this article, I hope you understand the customer-churn prediction.

Happy reading, happy learning and happy coding!

Source code: Customer-Churn-Prediction, Customer-churn-Analysis-Part1

— Thanks to Digipodium and zaid sir, I truly appreciate everything you have done for me so far and hope to continue learning from you.😊