Abstract
This paper focuses on the problem of sequential prediction for imbalanced data streams. A novel hybrid algorithm called Weighted OS-ELM and Dynamic Generative Adversarial Nets (GAN-WOSELM) is presented for handling this issue. In the data reconstruction stage, GAN is utilised for generating lifelike minority class samples to equilibrate the data distribution. Then a principal component score threshold helps judge unusual data. In the model update stage, the new constructive ELM is applied to forecast time-varying data chunk. After concerning fitting accuracy and data change, the analytical relationship between the new weight and the shifting imbalance ratio is determined. Therefore, the GAN-WOSELM can update weight quantificationally, and it avoids interactive parameter optimization. According to the suitable weight for the arriving chunk, the proposed method is able to perceive the changeable data distribution and do the adaptation by itself, thus building a reliable model with a low fitting deviation. Numerical experiments are conducted on four different kinds of UCI datasets. The results demonstrate that the proposed algorithm not only has better generalisation performance but also provides higher numerical stability.
Introduction
Time series with imbalanced labels is universally presented in real world applications, such as network intrusion detection [1], traffic flow analysis [2], handwritten character recognition [3] and spam filtering [4]. Different from batch learning processes, sequential prediction involves learning from chunk-by-chunk data without reprocessing observed examples [5]. The unbalanced sequence leads to a varying imbalanced ratio (IR) and unknown sample size. Particularly in shifty environments, forecasting model cannot always be rebuilt when the new data chunk arrives. If the update strategy cannot accommodate fast-changing test data, the original predictive model may become inaccurate.
As we all known, there are two main ways to solve the imbalanced learning problem [6]. One technique is applied at the data level, the other at the algorithm level. For the first type, the unequal cost of different class sample errors is taken into consideration. Usually, the cost of erroneous classification from minority samples is greater than the majority one [7]. Multiple resampling technologies are put forward to generate similar examples so as to equilibrate the class frequency distribution [8, 9]. If the new datasets achieve approximate equilibrium, the normal sequential learning algorithm will have a role in prediction. For the second method, multiple individual methods are integrated to alleviate the whole bias [10, 11]. A strong learner emerges. In view of the sequential prediction, the ensemble learning methods are created to deal with class imbalance dynamic data streams [12, 13]. A representative classification method for sequential prediction of imbalanced dataset is the weighted online sequential extreme learning machine (WOS-ELM) [14]. It is a general learning method that alleviates the class imbalance problem in both one-by-one and chunk-by-chunk. These two types of methods strive to solve the imbalanced problem in the variable environment. New dataset or changeable weight will help to improve the generalization performance.
In this paper, we present a novel hybrid method with Weighted OS-ELM and Dynamic Generative Adversarial Nets (GAN-WOSELM). Similar to SOMTE-OSELM [15], GAN-WOSELM is suitable for sequential prediction of imbalanced time series. In contrast to the SOMTE-OSELM, the proposed method can optimise the OS-ELM structure by adjusting for the changes in imbalance ratio. The model passes helpful information from the current chunk to the arriving one, so the proposed method avoids the rebuilding in the online learning process. In addition to the above contribution, the proposed method will produce new data integrally according to game theory. The new sample is beneficial to balance category of varying data streams. As a supplementary note, a positive label represents the minority class and a negative label describes the majority class in the rest of this paper.
The remainder of this paper is organised as follows. In Section 2, the related work preliminaries involving hybrid classification method and online prediction are described. In Section 3, a new weighted OS-ELM with a Dynamic GAN is introduced. Experimental results and a discussion are presented in Section 4. Finally, Section 5 concludes the paper and suggests the future work.
Related work
In this section, we first review hybrid classification method, and then discuss the online learning frame for data stream. These preliminaries give theoretical support.
Hybrid classification methods
Hybrid classification methods better inherit the merits of the two types of methods [16, 17]. The meta-cognitive algorithms, whose parameters acquire global optimisation using a biological intelligence approach, have widely used to select suitable learning strategies for both stationary and non-stationary environments [18, 19, 20, 21]. That means these learning models need to tune parameters repeatedly. In order to figure out the issue, a computationally efficient ensemble algorithm is achieved for sequential imbalance learning [22]. And a regularised proximal learning framework has been used to solve the composite optimisation problem [23]. These methods tend to better cope with the concept drift problem. However, many hybrid methods rely on multiple parameters. It is troublesome to obtain the appropriate weight before the next chunk appears. When the data distribution of the time series changes continually, the model can hardly execute well or even become useless. Particularly data with high-dimensional features, correlation analysis is crucial and the dimension reduction method cannot be neglected [24]. Although all the above hybrid methods can obtain an adaptive model, the neighbouring data cannot guarantee the validity of the generated data.
Online prediction
The online learning frame updates the original model according to the predicted deviations [25]. This frame has to address high-dimensionality sequence that involves only a fixed number of samples [26]. Then to break through the storage limitation, the Projectron algorithm guarantee the forecasting boundary [27]. The intrinsic characteristics between feature and label are a major concern, and the representative technique for fast nonlinear mapping is the extreme learning machine (ELM) [28]. As a fast and competitive machine learning technique, the extreme learning machine (ELM) calculates output weight analytically [29]. Due to the advantages of complete theoretical system and easy hardware implementation, it has become feasible for real-time prediction. After the new data chunk arrives, forecasting model usually updates the structure. For example, the convex incremental ELM (CI-ELM) avoids overfitting causing by enlarging network scale [30]. The error minimized ELM (EM-ELM) cut off redundancy connecting branch towards the predictive target [31]. D-ELM better adapts the network size and the target function is extracted effectively [32]. It not only has reliable accuracy, but also befits varying task types. While these methods depend on the original imbalance ratio, when the distribution characteristics of a dataset change frequently, model change is delayed. So an automatic well-matched model is essential for guaranteeing the generalisation error of global minima.
The proposed method
In this section, we introduce the proposed GAN-WOSELM in detail. GAN-WOSELM establishes a relationship between the learning model and the shifting imbalance ratio. It helps the novel ELM fit arriving data. The proposed hybrid algorithm includes two phases: data reconstruction and model update. They all greatly affect the generalisation performance of sequential class imbalance learning problems.
Data reconstruction phase
First of all, data generation is necessary to obtain the balanced sample. Generative adversarial Nets (GANs) are multiple neural networks which constitute a new framework for estimating generative models [33]. A continuous sequence is created by similar semantic features and dynamic GAN. Figure 1 illustrates the framework of the data generation.
Framework of data generation.
Considering the current sample set
where
We arrange eigenvalues from large to small as
where
where
The generator
where
After the cyclic adversarial process, we obtain the new training sample set, and the new imbalance ratio is represented as
Merging the new minority sample set and majority sample set, the number of data increases from
Expanded WOS-ELM, the proposed method adjusts the weights according to the changeable imbalance ratio and drift sample size. The nonlinear mapping function is chosen as a sigmoid function. It maps the variable to
The change of imbalance ratio is composed of two main parts. One part comes from the operation of generating minority class samples. The other part is from the varying data distribution between the adjacent chunks. Larger chunks are associated with a smaller change in the imbalance ratio, and vice versa. According to the ELM structure, the weights between the input layer and the hidden layer are randomly selected, and the hidden learning parameters are calculated according to the Moore-Penrose generalised inverse [35]. The basic mapping relation can be expressed as
where
where
At the original learning phase, added weight factor, the output weight can be calculated as
At the original learning phase,
where
With increasing size of the dataset, the element number of the weight set goes up. Then the weight at the
where
Besides, the Moore-Penrose generalised inverse and matrix transformation helps simplify the target equation. Hence, the output weights can be calculated as:
where
After the simplified processes,
Taking the relationship between
When the new samples arrive, the method alters the temporary parameters according to the variation of the imbalance ratio. The model can be updated quantificationally instead of by trial and error. Moreover, in the proposed GAN-WOSELM, the new data is created chunk by chunk, thus reducing the probability of overfitting. So, GAN-WOSELM is better suited for real-time online applications. Taking into account the characteristics of the learning procedure, the whole GAN-WOSELM algorithm is presented in Algorithm 1.
Remark A: Regarding the two types of neural network: GAN and ELM. We express the node number of one layer as
In contrast, the computational complexity for [15] concentrates mainly on the deviation calculation and online learning process. The computational complexity is
This section describes the experiment. Firstly, we list the datasets and evaluation criteria. Then, the results of the different methodologies are compared. GAN-WOSELM is compared with other classical online classification methods. The theoretical demonstration is shown in the last section.
Datasets description
The GAN-WOSELM method proposed in this paper has mostly been applied to the online prediction of imbalanced data streams. In order to evaluate the effectiveness of the proposed method, there are four UCI datasets with various dimensionalities to be selected [36]. Table 1 presents the specifications of each dataset.
Specifications of the datasets
Specifications of the datasets
Aiming at the various prediction problems, time series data must be normalised as follows:
where
The evaluation criteria of algorithm performance consist of multiple metrics. For dealing with two-class classification problems, the minority class is labelled as the positive sample and majority class are labelled as negative. The essential classifier result is to form a confusion matrix as shown in Table 2.
Confusion matrix
Confusion matrix
For imbalanced sequences, the overall error is no longer an appropriate performance measure. G-mean and F-measure evaluate generalisation performance better [37]. They are shown as the following two equations.
where
Then, the ROC (receiver operating characteristics) graph gives the overall performance assessment criteria [38]. The ROC graph is obtained by plotting the FPR on the x-axis and the TPR on the y-axis. The FPR is called the false positive rate, which is calculated according to the confusion matrix. The TPR is the true positive rate, whose definition is equal to Recall. They are shown in detail in the following equations.
Also, the area under the ROC curve (AUC), is a popular measure for imbalanced data prediction [39].
Spatial distribution (a), ROC curve (b), G-mean (c) and F-measure (d) of the four methods for Statlog.
Spatial distribution (a), ROC curve (b), G-mean (c) and F-measure (d) of the four methods for Pima.
Spatial distribution (a), ROC curve (b), G-mean (c) and F-measure (d) of the four methods for Cardiotocography.
Spatial distribution (a), ROC curve (b), G-mean (c) and F-measure (d) of the four methods for Pen-Based.
In order to see how the GAN design influences the performances of the proposed system, we firstly try the different GAN architectures. The hidden node number of generator is 100. For imbalanced data stream forecasts, the OS-ELM, WOS-ELM and SMOTE-OSELM methods are chosen for comparison. The chunk size is chosen as 50 for the four datasets. In the first experiment, we show the prediction result of Dataset 1. Figure 2 shows the evaluation effect of an execution round for the Statlog.
In Fig. 2a, the spatial distribution is shown according to the first two principal components. The blue circle describes the correctly classified samples and the red spots indicate the misclassified ones. Then, we show a black line describing the proposed method and coloured lines with different shapes, which are the contrastive methods. The averaged ROC curves of OS-ELM, WOS-ELM, SMOTE-OSELM and GAN-WOSELM for online prediction are shown in Fig. 2b. The AUC of our approach is 0.9087, which is larger than the others. Figure 2c and d compare the accumulated G-means and F-measures. Obviously, our approach has better generalisation performance compared with the other constructive ELM. The F-measure is almost above 0.95, which is almost closed. It is proven that the proposed method has better capacity of adjustment. This advantage generates a more suitable model with the changing data.
In the second experiment, we show the prediction result of Dataset 2. Figure 3 shows the evaluation value of an execution round for the Pima.
From the spatial distribution of Dataset 2 shown in Fig. 3a, we can see that there are many more correct predictions than incorrect ones. In Fig. 3b–d, if the data dimensionality decreases, the SMOTE-OSELM method loses efficacy. The proposed method has better generalisation performance than the OS-ELM and WOS-ELM methods. If the imbalance ratio is extremely low, the traditional weighted ELM usually leads to excessive learning. It is difficult to control the optimal parameters. However, the GAN-WOSELM method changes the model along with the changed imbalance ratio; therefore, the fluctuation of generalisation performance is not volatile.
In the third experiment, we show the prediction results of Dataset 3. Figure 4 shows the forecast effect of an execution round for the cardiotocography dataset.
The spatial distribution of Dataset 3 is shown in Fig. 4a. Almost all results have the correct predictions, especially for the higher principal component scores. As seen in Fig. 4b, all the ROC curves are beyond the diagonal line and the proposed method has the best one. In Fig. 4c and d, the G-means and F-measures of the four methods are high at first. However, as time goes on, the contrastive hybrid method tends to decline obviously. It is proven that the proposed method has better adjustment capacity. This advantage indicates that the model is more suitable for use with changing data. Although the number of features is high, the generated data have similar meaning.
In the last experiment, we show the prediction results for Dataset 4. Figure 5 shows the evaluation effect of an execution round for the Pen-Based dataset.
The spatial distribution of Dataset 4 (Fig. 5a) shows that the proposed algorithm has excellent prediction ability. In Fig. 5b, the proposed method has a better fit than contrastive ELM. Then, in Fig. 5c and d, it is obvious that the proposed method has a higher G-mean and F-measure than the others. In the proposed method, the generalisation error trend is stable. This phenomenon indicates that due to the model having the ability for fast predictive control, the proposed method is more suitable for regular data.
The results for the four datasets processed with the three contrasting methods indicate that the proposed method gives satisfactory results. The prediction accuracy is significantly improved compared to the OS-ELM, WOS-ELM and SMOTE-OSELM methods. In the proposed method, the generalisation error trend is stable and the F-measure is close to one. It is proven that the proposed method deals with the online prediction of imbalanced data streams well. Of course, the game process of the proposed method requires more time than in SMOTE. With the development of hardware resources, computational time in practical applications is becoming shorter.
Conclusion
In this paper, a Weighted OS-ELM method with a Dynamic GAN is proposed. Different from the traditional weighted sequential extreme learning machines, the proposed method determines the weight according to changes in the imbalance ratio. By means of the dynamic GAN, the data generation process does not lose information during resampling. The recursive weight is used for judging the whole network structure, which makes the prediction method can get accustomed to the sequential environment. Then, the prediction process can be automated and the model becomes suitable for non-expert users. In contrast with other methods, our analysis shows that the proposed method has the best generalisation performance. In future, our research will look at how to reduce computation complexity during online prediction of unstructured data.
Footnotes
Acknowledgments
The work was supported by the National Key Research Project of China under Grant No. 2016YFB1001304, and the Fundamental Research Funds for Central Universities under Grant No. 2017TD-19. The authors gratefully acknowledge financial support from the Research Centre for Intelligent Signal Identification and Equipment, Jilin Province.
