Abstract
The operation and maintenance of modern aircraft multi-sensor data fusion systems generate vast amounts of numerical and symbolic data. Learning useful and non-trivial insights from this data may lead to considerable savings, and detection and reduction of the number of faults, as result increasing the overall level of aircraft safety. Several machine learning techniquesexist to learn from big amounts of data. However, the use of thesetechniques to infer the desired readable and accurate interval regression tree models from the data obtained during theoperation and maintenance of aircraft is extremely challenging. Difficulties that need to be addressed include: data warehouse collection and preprocessing, data labeling, machine learning model readability, setup, evaluation and maintenance. This paper presents the Interval Gradient Prediction Tree algorithm INGPRET, which addresses these issues. As shown by our empirical evaluation of a real aircraft multi-sensor data set, the INGPRET algorithm provides better readability and similar performance in comparison to other regression tree machine learning algorithms.
Keywords
Introduction
The operation and maintenance of modern sensor-equipped systems such as aircraft generates vast amounts of numerical and symbolic data streams. These data streams are generated by thousands of sensors installed in various components of the aircraft and then sent in real-time to relational databases storages in ground stations. Before being transmitted to the ground, a number of on-board computer systems monitor and analyze the data stream in order to make sure that various systems of the aircraft are operating properly. However, once the data stream is stored in central databases, further data analysis is rarely performed. This paper presents an algorithm that makes use of this data stream in order to develop interval Machine Learning ML regression tree models to predict the need for replacement of various aircraft components before they become non-operational. The end goal is to implement this ML model in a flight monitoring system that will receive the real-time multi-sensor data input from aircraft fleet, analyze it, and output alerts in the form of appropriate replacement rules when there is a need for component maintenance.
The monitoring system will use the automatically generated multi-sensor data stream from the aircraft, and induce an interval regression tree model described in this paper, to detect component problems and recommend their replacement. Such a system could help improve the airline’s operation by: reducing the number of delays, reducing maintenance costs, helping achieve better maintenance planning, and increasing the level of safety. The approach proposed in this paper applies techniques from the fields of machine learning on big amounts of complex historical data in order to develop the predictive models required by the monitoring system. The approach described addresses four fundamental difficulties with existing data mining approaches: automatic selection of relevant data, automatic labeling of instances, an evaluation method that accounts for dependencies between the instances, and a scoring function measuring the extent to which the results fit the domain requirements. By addressing these four issues, we believe that the proposed approach will help extend the range of potential applications for ML techniques. Examples of other applications that can benefit from the approach developed in this paper are: prediction of problems in complex systems (e.g.: trucks, ships, trains, and cars), prediction of problems with complex industrial equipment for which a lot of data is continuously acquired, and prediction of critical events in medical applications (e.g. Emergency Room care). The fact that the proposed approach relies on a minimal amount of domain specific information will also facilitate the adaptation to other applications.
The paper is organized as follows: The related work is described in the next section. In Section 3 the INGPRET algorithm is introduced and its computational complexity and performance metrics are analyzed. In Section 4 some experimental results are reported for two real-world temporal data sets. Section 5 discusses the main features of the proposed algorithm and identifies some future research directions.
Related work
Actually, most of the regression tree algorithms apply binary recursive partitioning, since each node is always split into two child nodes, and are recursive, because the process is repeated at every node. It is also possible to split the data into three or more subsets or child nodes. Regression trees provide quite simple and easily interpreted regression models with reasonable accuracy. However, according to Breiman et al. [1], these methods are known for their split instability. Finally, the interested reader may find a more detailed survey of regression tree methods in [2].
XGBoost stands for eXtreme Gradient Boosting and is a scalable implementation of gradient boosting regression trees [3]. Since its release in 2016, XGBoost has been a very popular machine learning method, and it has a highly impressive winning record when it comes to machine learning competitions. XGBoost has already been used in several aviation risk assessment applications [4, 5] and represents a boosting ensemble of regression and decision trees. It is worth noting that XGBoost performance is not affected by multicollinearity (highly correlated explanatory variables) [5], which is often highly present in multi-sensor data.
Interval prediction is an important part of the forecasting process and is intended to enhance the accuracy of point estimation. An interval forecast usually consists of upper and lower limits between which the future value is expected to lie with a prescribed probability. The limits are sometimes called forecast limits [6] or prediction bounds [7], while the interval is sometimes called a confidence interval [8] or a forecast region [9]. We prefer the more widely-used term “prediction interval,” as used by [10, 11], both because it is more descriptive and because the term “confidence interval” is usually applied to interval estimates for fixed but unknown parameters. In contrast, a prediction interval is an interval estimate for an unknown future value. Because a future value can be regarded as a random variable at the time the forecast is made, a prediction interval involves a different sort of probability statement from that implied by a confidence interval.
The intuition behind the approach
In many aircraft multi-sensor data streams, the data is available as time-continuous statistical moments such as mean or variance that are calculated over pre-defined measurement intervals, rather than as raw values sampled at discrete points in time. However, reporting single predicted values for the mean response values of new sensor measurement intervals can be misleading. The reason is that due to a large unexplained variance of the target variable, for many intervals the actual mean values may be very different from any specific point estimation. In this paper, we therefore shift our attention from predicting a single mean value to predicting intervals, which are expected to contain the actual mean values with a given probability.
The above considerations create a need for a stable algorithm that can process incoming mean-variance aggregated multivariate temporal data and makes stable interval predictions of a target numerical variable, with a given degree of statistical confidence.
The proposed method is based on the assumption that input and output variables in an aggregated data stream are characterized by linear or nonlinear dependencies (or both), which can be represented using the proposed INGPRET model. The proposed regression trees algorithm differs from currently described [2] state-of-the-art regression tree algorithms such as XGBoost, CART, Random Forest, RETIS, M5, SMOTI, MAUVE, GUIDE by the following characteristics:
The use of synchronous mean and variance estimators of numerical features The use of gradient-boosting learners based on mean variance estimators Node splitting based on the Mahalonobis distance between the two statistical estimators Novel representation of prediction intervals at the tree leaves
In our opinion, the suggested approach enables one to utilize predictive feature information obtained from mean and variance of temporally aggregated instances. This approach also enables achieving a considerable reduction in the depth of the induced prediction tree by using interval prediction tree leaves.
Both statistics can be used as candidate predictive features by a prediction tree induction algorithm. Therefore, if the two represented statistics indeed exhibit independent and identical behavior, then the aggregated input variable can be represented within a robust two-tail prediction interval at a user-defined confidence level, such as 95%.
In our INGPRET algorithm, the average and variance of each aggregated input variable is mapped to the univariate Mahalonobis statistic using an auxiliary control variable
and
Let us suppose that each aggregated instance
INGPRET tree growing pseudo code.
To the best of our knowledge, the studies discussed in the regression tree section of the methodology section restrict their attention exclusively to point forecasting. A point forecast is a single number, which is an estimate of the unknown true future value. Although it is the most likely estimation of the possible future value implied by the induced model, it provides no information as to the degree of uncertainty associated with the forecast. For this reason, one may justifiably argue that the comparison of alternative point forecasts is of limited use because such a comparison completely neglects the variability associated with forecasting. For an improved and more meaningful comparison of the performance of forecasting models, the degree of uncertainty associated with forecasting should be explicitly taken into account. One of our main contributions in this research is focused on constructing an advanced prediction interval model at the INGPRET regression tree leaves. A prediction interval indicates a range of possible future outcomes with a prescribed level of confidence. As Chatfield [10] points out, interval forecasts are of a greater value to decision-makers than are point forecasts. Interval forecasts, therefore, should be used more widely in practical applications, as they allow for a thorough evaluation of future uncertainty and for contingency planning.
Traditionally, prediction intervals have been constructed based on the assumption that forecast errors follow a normal distribution. However, the validity of this normal approximation is doubtful, because the assumption of normality of the forecast error distribution often may not be justified in practice. The INGPRET algorithm computes the bounds of a prediction interval
If training instances (
Else, if
where
The INGPRET tree growing phase presented in Fig. 1 continues until at least one stopping criterion is met. In the INGPRET case, we distinguish between three stopping rules. The first gradient optimization weak learner (1) is applied by the algorithm when the selected terminal node instances are normally distributed. In this case, we will represent the terminal nodes by prediction intervals with the aid of previously calculated statistical moments. The stopping criterion evaluates the minimum probability for observations to lie within the prediction interval. Construction of the tree then stops when a user-defined threshold criterion is reached. The threshold criterion is the confidence level, which was previously defined by the user as
The performance metrics
The overall performance of the INPGRET algorithm is compared to the state-of-the-art point predictors calculated by two global metrics, namely the Root Mean Squared Error (RMSE) and Root Mean Absolute Error (RMAE). Intuitively, they represent the square and absolute difference between an interval average value and the true value of the quantity being estimated, and will be computed by the following expressions:
where
In order to compare the accuracy of trees from different domains, the Explained Variability (EV) measure is defined by Eq. (6). This measure evaluates the goodness-of-fit of a given model, or in other words, it answers the question: “How well does the corresponding prediction model approximate the real data, when compared to the mean rule model?”
Here
Another option for comparing regression tree models is the Cost Complexity Measure (CCM) [13], which uses Root Mean Square Error
Here
Multi-sensor data fusion (MSDF) is the process of combining or integrating measured or preprocessed data or information originating from different active or passive sensors or similar sources, in order to produce a more specific, comprehensive, and unified data set about an event of interest that has been observed [14, 15]. A successful MSDF model should achieve improved accuracy and more specific inferences than could be obtained using a single sensor alone.
The data set for the experiment was obtained from a proprietary MSDF data warehouse of aircraft maintenance data and is illustrated in Fig. 2. This data warehouse was designed for fast multi-sensor data retrieval and aggregation from an Oracle OLTP system source. The star-based fact table was equipped with flight time, sensors data, information on failures, systems data, prediction parameters, and flight and maintenance dimensions, while the highest flight time resolution was represented in minutes of aircraft flight.
The aircraft maintenance MSDF data sets
The aircraft maintenance MSDF data sets
System A data sets learners comparison (10 time 10 fold cross validation)
Aircraft maintenance data (proprietary).
Finally, four distinct data sets were derived from the aircraft maintenance MSDF data warehouse (Table 1). The first two data sets consist of 1,316 and 1,068 instances, respectively, and have a numerical target variable of the number of failures in a specific aircraft system denoted by System A. The main difference between the two first data sets is in the prediction lag of the previously defined target variable. Thus, in the first two data sets the prediction lag equals 20 and 50 hours, respectively. The third and fourth data sets consist of 1,310 and 1,050 instances, respectively, and use the same prediction lags with the predicted target variable of failures in a system denoted as System B.
All data sets were represented by 203 numerical attributes. The first attributes denote the flight ID and the start and end flight timestamps, while the remaining 200 numerical attributes represent aggregated mean/variance estimators of flight time intervals.
Table 2 demonstrates averaged results for System A from the 10 time 10 fold cross-validation for nine state-of-the-art models, versus the INGPRET algorithm. It is clear that in both data sets the least reliable models in terms of RMAE and RMSE accuracies are: Support Vector Machines with Radial Basis Function Kernel (SVM RBF), Neural Network Multi Layer Perceptron (NN-MLP), XGBoost Trees, and Bagging M5 Tree (B-M5P). It can be seen that errors for data sets in the 20-hours prediction horizon are lower than those in 50-hours horizon. This result confirms the fact that all the regression tree models used for System A data sets are more stable and accurate in the short-time horizon prediction. Another interesting result is that the relatively large Random Forests trees model does not decrease the overall model accuracy. This result confirms previous studies, which showed that the Random Forest RF algorithm tends to overlearn in situations that involve large amounts of time-series correlated data.
System B data sets learners comparison (10 time 10 fold cross validation)
System A data sets learners (%) of confidence level (CL) comparison. Black bars demonstrate paired Student-t tests for RMSE accuracy and grey bars CCM tree size learners. Dotted line sets 90% confidence level accuracy.
Figure 3 demonstrates paired Student-t test results between analyzed regression tree models and worst models SVM RBF and B-M5P with the confidence level of 90%. In the case of the 50-hour prediction horizon (3.a), the best models in terms of the RMSE measure and CCM, are XGBoost, RepTree, and the INGPRET tree. It might be said that XGBoost is slightly more accurate than INGPRET, but the INGPRET model, in its turn, is more compact in terms of tree size (243 vs. 283).
Similarly, in the case of the 20-hour prediction horizon (3.b), the best models are XGBoost, M5P, RepTree and the INGPRET tree. The same scenario is reproduced yet again here: the INGPRET model tree is more compact and slightly less accurate than corresponding XGBoost model.
Similarly to System A, Table 3 demonstrates the System B averaged results from the 10 time 10 fold cross validation for nine state-of-the-art models and our INGPRET algorithm. As in the experiments with System A data sets, both System B data sets represented SVM RBF and Neural NN-MLP as the worst models. It can be seen here, as well, that the accuracies for the 20-hour-horizon data set are lower than those for the corresponding 50-hour-horizon, and as in the case of System A, the larger B-M5P regression trees model does not decrease the overall accuracy of the RMAE and RMSE models.
System B data sets learners (%) of confidence level (CL) comparison. Black bars demonstrate paired Student-t tests for RMSE accuracy and grey bars CCM tree size learners. Dotted line sets 90% confidence level accuracy.
Figure 4 demonstrates paired Student-t test results between System B analyzed regression tree models and the worst models (NN-MLP and SVM RBF), with the confidence level of 90%. In the case of the 50-hour prediction horizon (4.a), the best models in terms of the RMSE measure and CCM are: XGBoost, INGPRET, Random Forest RF and the RepTree models. It might be said that XGBoost is more accurate (0.98 vs. 1.10) and compact than INGPRET (103 vs. 110) tree. In the case of the 20-hour prediction horizon (4.b) the XGBoost model significantly outperforms other models, whereas the M5 Rules model significantly outperforms other models in terms of CCM and tree size. It is important to note that this result has one significant performance drawback, which is a result of the model’s relatively long average training time of 10.56 minutes versus 1.84 minutes for the INGPRET method.
Finally, it can be concluded that in both systems INGPRET accuracies were significantly higher than SVM, Neural Network and traditional additive regression tree models and similar to the modern gradient-boosting regression tree XGBoost algorithm. This fact confirmed our hypothesis that in large data sets the prediction tree strategy based on the mean-variance aggregation is preferable to the global additive regression model approach.
In aviation, each failure is important in terms of both safety and cost. An optimized maintenance scheduling can only be realized with close-to-realistic forecasts. Having maintenance times predicted ahead of failures will both prevent security excesses and reduce costs. In this paper, we have presented the INGPRET Interval Gradient Prediction Tree algorithm, which can predict values of numerical attributes in aggregated temporal data streams. The proposed algorithm differs from existing state-of-the-art regression algorithms in that it accomplishes the splitting of each input continuous feature according to the best mean-variance contributor, and because it removes outliers from the training data. As a result, the algorithm builds more compact interval prediction trees. The experiments conducted on two real-world data sets indicate that the proposed INGPRET algorithm produces accurate and compact interval models compared to such state-of-the-art machine learning algorithms as Support Vector Machines with Radial Basis Function Kernel (SVM RBF), Artificial Neural Network Multi-Layer Perceptron (ANN-MLP), and the Bagging M5P Tree.
