This page describes PREPIPE: PREdictive maintenance PIPEline
PREPIPE is an advanced predictive pipeline evaluated in the context of predictive maintenance with an automotive case study.
The code is given to support the paper:
Danilo Giordano, Flavio Giobergia, Eliana Pastor, Antonio La Macchia,Tania Cerquitelli, Elena Baralis, Marco Mellia, Davide Tricarico, (2020), “Data-Driven Strategies for Predictive Maintenance: Lesson Learned from an Automotive Use Case”, Computers in Industry.Please refer to it for the main concepts and for citation.
Predictive maintenance is an ever-growing topic of interest, spanning different fields and approaches. In the automotive domain, thanks to on-board sensors and the possibility to transmit collected data to the cloud, car manufacturers can deploy predictive maintenance solutions to prevent components malfunctioning and eventually recall to the service the vehicle before the customer experiences the failure.
In this paper we present PREPIPE, a data-driven pipeline for predictive maintenance.
Given the raw time series of signals recorded by the on-board engine control unit of diesel engines, we exploit PREPIPE to predict the clogging status of the oxygen sensor, a key component of the exhaust system to control combustion efficiency and pollutant emissions.
In the design of PREPIPE, we deeply investigate: (i) how to choose the best subset of signals to best capture the sensor status, (ii) how much data needs to be collected to make the most accurate prediction, (iii) how to transform the original time series into features suitable for state-of-art classifiers, (iv) how to select the most important features, (v) how to include historical features to predict the clogging status of the sensor. We thoroughly assess PREPIPE performance and compare it with state-of-art deep learning architectures.
Our results show that PREPIPE correctly identifies critical situations before the sensor reaches critical conditions.
Furthermore, PREPIPE supports domain experts in optimizing the design of data-driven predictive maintenance pipelines with performance comparable to deep learning methodologies while keeping a degree of interpretability.
Prerequisites
- The jupyter notebooks run on Linux, with: Python 3.7, sklearn 0.22 pandas 0.25.3 numpy 1.17.4 scipy 1.4.1.
- The grid search notebooks run on Spark version 2.4.0-cdh6.2.1
Data samples
- data/ contains samples of the data.
- Each cycle C0, C1, C2 is a CSV file where the first row is the header containing the signal names, while all the following rows store the samples of all signals reordered by Program A.
- cycle_order is a CSV file where the first row is the header with the cycle name and label, and all the following rows store the name of the cycle and the assigned label according to Program B.
- Cycles in this file must be sorted by acquisition time.
- All the code, except for the Unsupervised signal selection, run with tabular data. So either ad-hoc tabular data can be provided, or the 1c-DatasetCreation notebook must be used to transform cycle data into tabular data.
- An example of tabular data is available in 1-SignalSelection/dataset/All.pkl.
- This pickle file is a pandas dataframe, where the header contains: ExpID (the name of the cycles), all the features, label. All the following rows contain the cycles’ data.
Code
- 0-ValidationTestDivision: contains the jupyter notebook to compute the CAI Index.
- 1-SignalSelection: contains the jupyter notebooks to compute all the signal selection algorithms presented in the paper. To run, follow the alphabetic order.
- 2-Windowing: contains the jupyter notebook to split the cycles into different time windows with different sizes.
- 3-FeatureSelection: contains the jupyter notebook to rank the features according to the FS algorithm.
- 4-Historicization: contains the jupyter notebook to create the dataset with historical features.
- 5-ModelTrainingTuning: contains the jupyter notebooks to run the grid search performing either the k-fold cross validation or the time series cross validation in D1 and the hold out validation in D2.
- 6-DeepLearning: contains the scripts to create and validate the deep learning models.
- classes/parameters/ConfGenerator: the jupyter notebook creates the grid search space for each hyperparameter of the tree, forest, SVM classifier.
- classes/public/makerDatasetSpecialized: implements the code for the different feature extraction strategies.
- Each step is used to create tabular datasets based on each step choice.
- Since PREPIPE is based on a wrapping approach, to identify the best choice in each step (1,..,4), the created datasets must be tested with the 5-ModelTrainingTuning notebook.
- For the identification of the best choice for each step, in 5-ModelTrainingTuning/gridresult, we report the notebooks to analyze the grid search results of each step.
- As a grid search result, 5-ModelTrainingTuning/gridresult/1-SignalSelection/ reports two examples of grid search results for the 10-fold Cross Validation (CV) case and Time Series Cross Validation (TS) case. Please refer to the 5-ModelTrainingTuning/gridresult/README_Output.md for a complete overview of the output file.
Code and Samples of data are at https://github.com/SmartData-Polito/PREPIPE/