Predict Pump Failure Before It Happens Using Deep Learning Model

--

There are a lot of pump systems invloves in power plants which are the main source to produce electricity. The various power plants are thermal power plant, hydroelectric power plant, nuclear power plant etc.

The pump systems need to be maintained in proper condition in order to get continuous power supply. If any of the pumps fails in the pump system there is a possibility of reduction in power generation some times which may lead to complete shutdown also.

This could be avoided if we predicted failure in advance. Here we are going to predict the failure well before to avoid huge economic loss.

Use sensors to record temperature, pressure, vibration, load capacity, volume, flow density etc. These are only initial investments to set up the data collection process. Feed the collected data into the ML model to identify pump failure.

The pump related sensor data provided by Kaggle. The data set contains timestamp, 52 sensors data, machine status. Here the sensor data is recorded for every minute and corresponding machine status also provided.

Let us dive into our analysis.

Import Packages

Load Data

The dataset contains 220320 datapoints and 55 features.

In these 55 features, timestamp and machine_status belongs to object datatype, all sensor data belongs to float datatype and Unnamed: 0 is int datatype.

Big Data Jobs

Find Missing Values

We found that sensor_15 data is completely missing and sensor_50 data is missing around 40%

Find Duplicate Values

We found that there is no duplicate records

Label Data

The label data has 3 machine status values. BROKEN represents machine is failed. RECOVERING represents machine trying to recover from failed status. NORMAL represents the machine is working in normal status.

Data Preprocessing

Add Feature Time Period

The purpose of adding time_period is to find that any significance in pump failure with respect to time. We have divided 24 hrs into Morning, Noon, Evening and Night and trying to find at what time failure occurs most.

Fill Missing Values

Normalize Values

The purpose of normalization is to scale numeric data from different columns down to an equivalent scale. We are using StandardScaler normalization.

StandardScaler standardizes a feature by subtracting the mean and then scaling to unit variance. Unit variance means dividing all the values by the standard deviation.

Encode Label

We replace the categorical value with a numeric value. The machine_status, NORMAL is mapped to 0, RECOVERING and BROKEN is mapped to 1.

Correlation

  • There is a strong correlation exists between sensor_14 to sensor_26, sensor_28 to sensor_33 and sensor_34 to sensor_36 data
  • The above mentioned data can be avoided for machine learning model otherwise performance will decrease.
  • machine_status highly positive correlated with sensor_01 to sensor_12 data
  • Here we are considering 0.5 and above sensor data which influences the machine_status.

Trending AI Articles:

1. Why Corporate AI projects fail?

2. How AI Will Power the Next Wave of Healthcare Innovation?

3. Machine Learning by Using Regression Model

4. Top Data Science Platforms in 2021 Other than Kaggle

Feature Engineering

Feature Importance

  • It uses ensembles of decision trees to compute the relative importance of each feature.
  • These importance values can be used to inform a feature selection process.
  • Based on feature_importances_ value, we have selected features
  • sensor_04, sensor_10, sensor_02, sensor_11, sensor_12, sensor_05, sensor_03, sensor_06, sensor_01, sensor_00, sensor_09, sensor_07, sensor_38, sensor_40, sensor_08, sensor_13, sensor_51

KS Test

  • The first plot shows left tailed distribution for machine_status NORMAL
  • The second plot shows non-uniform distributions for machine_status BROKEN
  • statistic value how much close to 1 indicates two distributions are different
  • p-value < 0.05 means reject the null hypothesis that the two samples were drawn from the same distribution
  • Here it is shown for sensor_01 data like these other sensors data also need to be analyzed.

Q-Q Plot

  • Comparing two probability distributions by plotting their quantiles against each other
  • If the two distributions which we are comparing are exactly equal then the points on the Q-Q plot will perfectly lie on a straight line y = x
  • The above Q-Q plot is for sensor_06 data for machine_status NORMAL and BROKEN states.
  • The two distributions are not lie on the straight line
  • It shows they are not from same distribution
  • Similarly we can plot and check other sensors data also.

KL Divergence

  • Kullback-Leibler divergence calculates a score that measures the divergence of one probability distribution from another.
  • When the score is 0, it suggests that both distributions are identical, otherwise the score is positive.
  • From the output, we can conclude that the two distributions are same

Select Features

  • The above sensor data have high correlation with machine_status
  • These features are significant in getting target value

Exploratory Data Analysis

Plot sensor_01 vs machine_status

  • when sensor_01 records < -1 continuously there is a more possibility of machine failure
  • when machine_status touches 0 indicates failure
  • no trend or seasonal changes
  • Similarly we have to do data analysis for other sensors also.

Test Stationary

Dickey-Fuller Test

  • Statistical tests make strong assumptions about data.
  • The Dickey-Fuller test is a type of statistical test called a unit root test.
  • It uses an autoregressive model and optimizes an information criterion across multiple different lag values.
  • The intuition behind a unit root test is that it determines how strongly a time series is defined by a trend.
  • The null hypothesis of the test is that the time series is not stationary.
  • The alternate hypothesis is that the time series is stationary.
  • The test statistic value is -6.06
  • It is less than -3.43, which is critical value of 1%
  • The more negative this statistic, the more likely reject the null hypothesis
  • p-value is less than 0.05 means that reject the null hypothesis
  • So, the timeseries is stationary
  • Similarly we have to do analyzis for other sensors data too.

Seasonal Decompose

Seasonal decompose gives trend, seasonal and residual values. Plot these values to find any time depedent structure. Here sensor_01 data is used to plot.

  • As per sensor_01 data there is no trend we seen
  • The data is every minute collection of sensor measurements
  • We can confirm about trend changes by seasonal decomposition values.
  • From the plot, we come to know there is no trend follows.
  • No seasonal changes happening.

Create ML Model

Prepare Dataset

  • We have created 2 datasets.
  • First dataset is created with original datapoints, where the label data is moved 10 steps ahead. Our idea is to find failure 10 minutes in advance.
  • Second dataset is created with average datapoints of last 10 values.
  • Average datapoints includes mean, median and standard deviation.
  • Here also label data moved 10 steps ahead to find prediction 10 minutes in advance.
  • Similarly create datapoints with median, std also
  • Do as we did it for original datapoints.

Logistic Regression For Original Points

  • Hyperparameter Tuning
  • Here tol, max_iter are considered as parameters and its values are tol=[0.0001,0.001,0.01,0.1], max_iter=[20,50,100,200]
  • tol represents tolerance for stopping criteria
  • max_iter represents maximum number of iterations taken for solvers to converge
  • Random search is the best parameter technique when there are less number of dimensions
  • Model With Best Parameter
  • Apply the selected best parameters to Logistic Regression and fit the model
  • Predict the probability score using predict_proba() for train and test data
  • Plot the roc curve for train and test data
  • Confusion Matrix For Train Data

Correct predictions: 14343 + 148759 = 163102

Incorrect predictions: 2028 + 65 = 2093

  • Confusion Matrix For Test Data

Correct predictions: 61 + 54695 = 54756

Incorrect predictions: 344 + 15 = 359

  • Classification Report For Train Data
  • Classification Report For Test Data

Similarly we apply logistic regression to average points dataset and results were noted.

Logistic Regression For Average Points

  • Confusion Matrix For Train Data

Correct predictions: 14291 + 149160 = 163451

Incorrect predictions: 1627 + 117 = 1744

  • Confusion Matrix For Test Data

Correct predictions: 52 + 54797 = 54849

Incorrect predictions: 232 + 24 = 256

  • Classification Report For Train Data
  • Classification Report For Test Data

Similarly applied Decision Tree, Random Forest and Xgboost algorithms to the two datasets and resluts were noted.

Decision Tree For Original Points

  • Confusion Matrix For Train Data

Correct predictions: 14366 + 150030 = 164396

Incorrect predictions: 757 + 42 = 799

  • Confusion Matrix For Test Data

Correct predictions: 57 + 54967 = 55024

Incorrect predictions: 72 + 19 = 91

  • Classification Report For Train Data
  • Classification Report For Test Data

Decision Tree For Average Points

  • Confusion Matrix For Train Data

Correct predictions: 14317 + 149673 = 163990

Incorrect predictions: 1114 + 91 = 1205

  • Confusion Matrix For Test Data

Correct predictions: 61 + 54945 = 55006

Incorrect predictions: 84 + 15 = 99

  • Classification Report For Train Data
  • Classification Report For Test Data

Random Forest For Original Points

  • Confusion Matrix For Train Data

Correct predictions: 14408 + 150774 = 165182

Incorrect predictions: 13 + 0 = 13

  • Confusion Matrix For Test Data

Correct predictions: 69 + 54957 = 55026

Incorrect predictions: 82 + 7 = 89

  • Classification Report For Train Data
  • Classification Report For Test Data

Random Forest For Average Points

  • Confusion Matrix For Train Data

Correct predictions: 14392 + 150781 = 165173

Incorrect predictions: 6 + 16 = 22

  • Confusion Matrix For Test Data

Correct predictions: 69 + 54951 = 55020

Incorrect predictions: 78 + 7 = 85

  • Classification Report For Train Data
  • Classification Report For Test Data

Xgboost For Original Points

  • Confusion Matrix For Train Data

Correct predictions: 14390 + 150781 = 165171

Incorrect predictions: 6 + 18 = 24

  • Confusion Matrix For Test Data

Correct predictions: 69 + 54986 = 55055

Incorrect predictions: 53 + 7 = 60

  • Classification Report For Train Data
  • Classification Report For Test Data

Xgboost For Average Points

  • Confusion Matrix For Train Data

Correct predictions: 14400 + 150787 = 165187

Incorrect predictions: 0 + 8 = 8

  • Confusion Matrix For Test Data

Correct predictions: 71 + 54945 = 55016

Incorrect predictions: 84 + 5 = 89

  • Classification Report For Train Data
  • Classification Report For Test Data
  • Based on our analysis Xgboost giving better results compared to Random Forest, Decision Tree and Logistic Regression
  • Between two data sets, original points giving better results compared to average datapoints.
  • Eventhough out test dataset recall score is high, the precision score is low. This leads to F1-score to be low.
  • Time taken to process one datapoint for original points dataset is less compared to average points dataset.
  • In order to obtain better results we are moving to DL.

Create DL Model

While collecting sensor data, we are recording machine status also. So, while creating deep learning model, we have included label data for training. Here we are considering last 10 datapoints and 11 features to predict the machine status of the 11th datapoint.

Callback

  • A callback is an object that can perform actions at various stages of training like the start or end of an epoch.
  • Callbacks can be used to write TensorBoard logs after every batch of training to monitor model metrics.

LSTM Model For Original Points

  • Long short-term memory (LSTM) is an artificial recurrent neural network (RNN) atchitecture used in the field of deep learning.
  • A sequential model is created.
  • Model is added with layer LSTM with units=64, activation=’relu’, input_shape=(X2_train.shape[1], X2_train.shape[2]
  • The output layer is dense layer with activation=’sigmoid’
  • The model, optimizer =‘adam’, loss=’binary_crossentropy’, metrics=[‘Precision’, ‘Recall’]
  • Train the model with epochs=100, batch_size=100, validation_data=(X2_test, y2_test)
  • Here we are showing epoch_loss, epoch_precision which are plotted using tensorboard logs.
  • Confusion Matrix For Train Data

Correct predictions: 150767 + 14402 = 165169

Incorrect predictions: 6 + 20 = 26

  • Confusion Matrix For Test Data

Correct predictions: 54992 + 76 = 55068

Incorrect predictions: 0 + 37 = 37

Json Model For Web App

LSTM Model For Average Points

  • Confusion Matrix For Train Data

Correct predictions: 150772 + 14403 = 165175

Incorrect predictions: 5 + 15 = 20

  • Confusion Matrix For Test Data

Correct predictions: 55004 + 75 = 55079

Incorrect predictions: 1 + 15 = 16

CNN Model For Original Points

  • Convolutional nets are deep because they rely on multiple layers of feature extraction.
  • A sequential model is created.
  • Model is added with layer Conv1D with units=128, kernals=2, activation=’relu’, input_shape=(X2_train.shape[1], X2_train.shape[2]
  • MaxPooling1D layer is added with size=2
  • The output layer is dense layer with activation=’sigmoid’
  • The model, optimizer =‘adam’, loss=’binary_crossentropy’, metrics=[‘Precision’, ‘Recall’]
  • Train the model with epochs=100, batch_size=100, validation_data=(X2_test, y2_test)
  • Here we are showing epoch_loss, epoch_precision which are plotted using tensorboard logs.
  • Confusion Matrix For Train Data

Correct predictions: 150761 + 14404 = 165165

Incorrect predictions: 4 + 26 = 30

  • Confusion Matrix For Test Data

Correct predictions: 55026 + 75 = 55101

Incorrect predictions: 1 + 3 = 4

CNN Model For Average Points

  • Confusion Matrix For Train Data

Correct predictions: 150764 + 14405 = 165169

Incorrect predictions: 3 + 23 = 26

  • Confusion Matrix For Test Data

Correct predictions: 55016 + 74 = 55090

Incorrect predictions: 2 + 3 = 5

  • Comparing both LSTM and CNN models, CNN model performance is good
  • Precision, Recall and F1-score values are high in CNN models
  • Number of correct predictions are high in original points dataset
  • Time taken to process one datapoint is less in original points dataset compared to average points dataset
  • So, we have choosen CNN model with original points dataset is better model in predicting pump failure.

Web Application

We have created deep learning model to predict the pump failure but how we know the model is performing good. How to test the model? How the client will use it?

For these reasons we are going to create web application. Web application is an application software that runs on web server. Web application is accessed by user through web browser with active internet connection.

Thank you for spending your valuable time to read this blog. I am happy if you share your views.

This is my first blog I have ever written. Basically I am Mechanical Engineer and did M.E in CAD and got interest in Machine Learning and AI applications. I am happy that I have combined all my studies and written this blog.

You can reach me in:

References

I would like to thank team of appliedaicourse.com for their guidance and support throughout the analysis.

Don’t forget to give us your 👏 !

I am basically Mechanical Engineer with CAD specialization in M.E. and I have 16 years of experience in software development and interested in AI applications.