An online adaptive classifier ensemble for mining non-stationary data streams

Abstract

Many real-world situations constantly generate concept-drifting data streams at high speed. These situations demand adaptive algorithms able to learn online in accordance with the most recent target function (concept). This paper presents Online Adaptive Classifier Ensemble, a new ensemble algorithm able to learn from concept-drifting data streams. The proposed algorithm uses a change detection mechanism in each base classifier in order to handle possible changes in the underlying target function. Each base classifier in the ensemble can alternate between three different stages during the learning process: stable, warning and drift. In a stable stage, the underlying target function is supposed to remain constant, and the corresponding base classifier is updated with each incoming training instance. In a warning stage, a possible change in the target function can be starting to occur, and an alternative base classifier is created and trained together with the other base classifiers. The alternative classifier is added to the ensemble if the drift stage is reached. The new algorithm is compared with various state-of-the-art ensemble algorithms for online learning. Empirical studies show that this proposal is an effective alternative for learning from non-stationary data streams.

Keywords

Classifier ensemble concept drift data stream massive data online learning

1. Introduction

Nowadays, very heterogeneous sources generate massive data continuously, without control of the arrival order and at high speed. Internet, cell-phones, cars, and security sensors are examples of such sources [22]. Because of the temporal dimension of the data and the dynamic aspect of many real-world situations, the target function to be learned can change over time. This situation, known as concept drift, complicates the task of estimating the target function, since a previously learned model can become outdated, or even contradictory regarding the most recent data. Learning from data streams is directly related to concept drift, because it is an inherent feature that usually appears when data arrive over time. Examples of real-world situations where these changes can emerge include changes in clothing fashion, news preferences and energy consumption [44, 22]. Spam filtering is another common example: spammers try to elude filters by changing the pattern of spam emails, requiring the continuous update of spam filters [25].

An effective method used to learn from non-stationary data streams is to ensemble classifiers. Ensembles combine learning models with the goal of improving the predictive accuracy obtained by single classifiers. To deal with concept drift, ensemble algorithms have adopted two main strategies in the learning process [21]. In the first strategy, the ensembles adapt their base classifiers at regular time intervals without considering whether a concept drift has occurred or not [40, 41, 12]. These algorithms store training instances in a buffer of a fixed size (time window) at each learning step to train the base classifiers [21]. The principal problem with this approach is the definition of an appropriate buffer size. For abrupt changes, smaller sizes can lead to a faster adaptation process, whilst larger sizes are more suitable for stable concepts. The second strategy, which is usually more effective, constantly monitors the ensemble consistency regarding new data [11, 43, 33, 35]. Significant variations in the predictive performance values can be interpreted as a concept drift. Basically, the ensemble algorithms eliminate, reactivate or add new base classifiers dynamically in response to these variations.

When a concept drift is detected, various state-of-the-art ensemble algorithms simply replace the base classifier with the current worst predictive performance by a new classifier [6, 5, 4]. Therefore, these ensemble algorithms handle concept drift only at the ensemble level, and not at the level of the ensemble’s base classifiers. The current worst classifier in the ensemble may not be the classifier whose predictive performance is affected by a current concept drift. Therefore, the strategy adopted by previous ensemble algorithms can lead to an ineffective adaptation to changes in the data distribution.

Handling concept drifts explicitly in the base classifiers can be more effective when learning from non-stationary data streams. In general, the adaptation mechanism of ensemble algorithms has received little attention in online learning. For example, previous approaches focused on the classifier’s diversity [33], recurrent concepts [35], adding randomization to training the base classifiers [5], or meta-learning [19].

This paper presents Online Adaptive Classifier Ensemble (OACE), a new ensemble-based algorithm that is able to handle concept drifting data in each base classifier. The OACE algorithm handles concept drift in each base classifier by alternating between three drift stages: in-control, warning, and out-of-control. These three stages are compatible with several method for concept drift detection in data streams [16, 3, 21, 2]. A base classifier is in an in-control stage when the current concept is stable, in a warning stage when a concept drift is likely to be starting to occur, and in an out-of-control stage when a concept drift is detected. At the in-control stage, only the corresponding base classifier is trained. At the warning stage, an alternative classifier is induced to replace the corresponding base classifier in response to a possible concept drift. If the warning stage is followed by the out-of-control stage, the alternative classifier replaces the corresponding base classifier and the new base classifier returns to the in-control stage.

The experiment results show that the proposed adaptation mechanism is more effective than the one adopted by previous ensemble algorithms. The experiments included various adaptive ensemble algorithms [6, 5, 4], online change detectors [16, 3, 21, 2], and ensemble algorithms without a change detection mechanism [8, 28, 41].

This paper is structured as follows. The next section defines data stream, concept drift, and presents common types of change in the distribution of the data arriving in the stream. Section 3reviews previous works on adaptive ensemble algorithms and their fundamental drawbacks. The proposed ensemble algorithm, OACE, is described in Section 4. In Section 5, OACE is compared with various state-of-the-art ensemble algorithms for data stream mining. Finally, Section 6 presents the main conclusions derived from this research.

2. Definitions

In online learning [42], a classification task is generally defined for a sequence (possibly infinite) of instances $S=e_{1},e_{2},...,e_{i},...$ arriving over time. Every training instance $e_{i}=(\overrightarrow{x}_{i},y_{i})$ is formed by a vector $\overrightarrow{x}_{i}$ and a discrete value $y_{i}$ . Each vector $\overrightarrow{x_{i}}\in\overrightarrow{X}$ has the same dimensions. Each dimension is named attribute and each component $x_{ij}\in\overrightarrow{x_{i}}$ is an attribute value (numeric or symbolic). The discrete value $y_{i}$ is named label and taken from a finite set $Y$ of possible class values.

It is commonly assumed that the data stream $S$ is generated by a probability density function $P(\overrightarrow{X},Y)$ . The classification learning task is to obtain a model from $S$ that approximates $P$ as $\hat{P}$ , so that $\hat{P}$ maximizes the predictive accuracy [14]. Concept refers to the probability distribution function of the problem at a given time stamp [32]. Therefore, a change in $P$ after a time stamp entails a concept change or concept drift. Gama and others [25] distinguish two main types of concept drift:

•
Real concept drift refers to changes in the distribution of posterior probability of the classes $P\left(Y\mid X\right)$ . These changes can occur without a change in the probability distribution of the instance space $P\left(X\right)$ .
•
Virtual concept drift happens when the probability distribution of the instance space changes ( $P\left(X\right)$ ) without affecting $P\left(Y\mid X\right)$ .

Various previous studies [39, 25, 4] identified two types of changes regarding the transition period between consecutive concepts: abrupt and gradual. An abrupt change occurs when the transition is instantaneous. A gradual change occurs when the transition period contains a certain number of training instances. Dealing with different types of concept drift and the trade-off between noise and sensibility to the change are important factors in the performance of an adaptive learning system.
3. Related work

Some ensemble algorithms divide the input data stream into blocks of training instances. Afterward, the base classifiers are created and trained from these data blocks. These ensemble algorithms update the base classifiers only when a new data block is completed and, consequently, their adaptation mechanism is slow [41, 8, 35]. Two examples of this type of algorithms are Accuracy Weighted Ensemble (AWE) [41] and Accuracy Updated Ensemble (AUE) [8].

Other ensemble algorithms are able to update the base classifiers online, being also able to handle concept drifts more efficiently. Online bagging [7] and boosting [15] are two examples. Online bagging and boosting have also been extended to deal with non-stationary data streams. LeveragingBag [5], OzaBagADWIN and OzaBoostADWIN are some examples [6]. Basically, they use the ADWIN change detector [3] to replace, when a concept drift is detected, the base classifiers with the lowest predictive performance in the ensemble by a new base classifier. LeveragingBag, different from OzaBagADWIN, leverage the performance of bagging with two randomization improvements: increasing resampling and using output detection codes. Another ensemble algorithm based on bagging is DDD (Dealing with Diversity for Drifts) [33]. DDD alternates between two states: (1) before a drift detection and (2) after a drift detection. Depending on the current state, DDD varies the ensemble diversity in order to adapt to changes quickly.

Gama and Kosina [20] presented an online learning algorithm that memorizes decision models whenever a concept drift is detected. The system uses a meta-learning technique to detect reoccurring concepts. The proposed algorithm is able to take proactive action by activating previously induced models. The main benefit of this approach is that the proposed meta-model is capable of selecting similar historical concepts.

Dynamic Weighted Majority (DWM) [28] handles concept drifts by monitoring the predictive performance of the ensemble. For this, DWM periodically performs a test. DWM decreases the weights of base classifiers in accordance with the incorrectly classified instances in this test. During the test, if the ensemble misclassifies a training instance, a new base classifier is created and added to the ensemble with a weight value equal to 1. DWM removes base classifiers from the ensemble when their corresponding weights are less than a fixed threshold $\theta$ . A problem of DWM is that it penalizes base classifiers when they misclassify instances, but weights are not increased when the base classifiers predict correctly.

Bifet et al. [2] proposed the induction of an ensemble of decision trees from different attribute subsets. The ensemble combines the output from the base classifiers using a simple perceptron. The combination of decision trees and ensemble techniques has also been adopted in the IADEM-3 algorithm [17]. IADEM-3 uses ensemble techniques to combine class votes from the alternative and main subtrees.

Although these ensemble algorithms are able to handle concept-drifting data, the adaptation mechanism is performed only at the ensemble level. For example, LeveragingBag, OzaBagAdwin, and OzaBoostAdwin monitor the predictive performance of each base classifier by using the ADWIN change detector. However, when a change detector estimates a concept drift, only the worst base classifier is replaced. As mentioned, this can lead to an ineffective adaptation to changes, as the worst classifier may not be the one affected by the current change.

Next section presents a new ensemble algorithm able to handle concept drift in each base classifier. The proposed adaptation mechanism aims to adapt the learning more efficiently than previous approaches when a concept drift is detected.

4. The new ensemble algorithm

The proposed algorithm, OACE, uses online bagging (Algorithm 4) or online boosting (Algorithm 4) for training the base classifiers of an ensemble. For an ensemble to be effective, its base classifiers must present diversity [29]. For example, if the outputs from the base classifiers are independent and the base classifiers have the same predictive accuracy, the majority vote is guaranteed to improve the predictive performance of the ensemble [30]. This diversity is obtained by bagging and boosting by inducing independent classifiers from variations of the training set. Online versions of bagging and boosting have also been used in online learning [36].

InputProcedure Bagging

Oror Andand

Initialize base models $\mathfrak{C}_{m}$ for all $m\in{1,2,...,M}$ training instance $m=1,2,...,I$ Set $K=\mathcal{P}(1)$ j=1,2,…,K Update $\mathfrak{C}_{m}$ with the current instance

Induction of base classifiers based on bagging. The batch bagging algorithm [7] builds a set of $M$ base models, inducing each model with a bootstrap sample of size $N$ extracted from the original dataset. These samples are created by selecting instances randomly with replacement. The training set extracted to induce each base classifier contains each of the original training instance $K$ times, where the probability function $\mathcal{B}(K=k)$ follows a binomial distribution. For large values of $N$ , the binomial distribution tends to a Poisson distribution $\mathcal{P}(\lambda)$ with parameter (mean) $\lambda=1$ , where $\mathcal{P}(\lambda)=\exp(-\lambda)/k!$ . This was used by Oza and Russell [36] to propose online bagging, a method that, instead of sampling with replacement, weights each instance according to a Poisson distribution $\mathcal{P}(1)$ .

The use of the Poisson distribution $\mathcal{P}(1)$ is well founded for online bagging, and many algorithms have used it to learn from data streams [4]. This online bagging version is used to train the base classifiers of OACE (Algorithm 4).

Oza and Russell [36] also proposed the online boosting algorithm. In this algorithm, the base classifiers are induced sequentially, and the induction of each new base classifier depends on the predictive performance of the previously induced classifiers. The key idea is to give more weight to misclassified instances, and to reduce the weight of those correctly classified. For such, the parameter $\lambda$ of the Poisson distribution $\mathcal{P}(\lambda)$ is dynamically updated over time. Online boosting is also used in combination with OACE to train the base classifiers (Algorithm 4).

InputProcedure Boosting

Oror Andand

Initialize base classifiers $\mathfrak{C}_{m}$ for all $m\in{1,2,...,M}$ Initialize $\lambda_{m}=1$ for all $m\in{1,2,...,M}$ training instance $m=1,2,...,M$ Let $\varepsilon_{m}$ be the error rate of $\mathfrak{C}_{m}$ $\varepsilon_{m}\geqslant\frac{1}{2}$ or $\varepsilon_{m}=0$ $\lambda_{m}=1$ $\mathfrak{C}_{m}$ correctly classifies instance $\lambda_{m}=\frac{1}{2(1-\varepsilon_{m})}$ $\lambda_{m}=\frac{1}{2\varepsilon_{m}}$ Set $K=\mathcal{P}(\lambda_{m})$ $j=1,2,...,K$ Update $\mathfrak{C}_{m}$ with the current instance

Induction of base classifiers based on boosting.Algorithmalgorithmautorefnamelist of algorithms name LeftleftThisthisUpup UnionUnionFindCompressFindCompress InputInput OutputOutput $i n s t a n c e$ : training instance $\lambda$ : instance weight in accordance with bagging or boosting $M$ : ensemble size Initialize the base classifiers $\mathfrak{C}_{m}$ and the change detectors $\mathfrak{D}_{m}$ for all $m\in\{1,∼{}2,∼{},∼{}...,∼{}M\}$ After initialized, classifiers are empty and change detectors are in an in-control stagetraining instance $m\leftarrow 1$ $M$ Update $\mathfrak{D}_{m}$ by using a test-then-train approach (Figure 1) Update $\mathfrak{C}_{m}$ with the current instance by using Algorithm 4 (OACE+Bag) or Algorithm 4 (OACE+Boost) $\mathfrak{D}_{m}$ is in a warning level Update an alternative base classifier $\mathfrak{A}_{m}$ with the current instance by using Algorithm 4 (OACE+Bag) or Algorithm 4 (OACE+Boost) $\mathfrak{D}_{m}$ is out-of-control Replace the base classifier $\mathfrak{C}_{m}$ with the alternative classifier $\mathfrak{A}_{m}$ $\mathfrak{D}_{m}$ estimates a stable concept Discard the alternative classifier $\mathfrak{A}_{m}$ ; The Online Adaptive Classifier Ensemble (OACE) algorithm.

OACE handles concept drift by a simple and efficient mechanism (see Algorithm 4). Each base classifier $\mathfrak{C}_{m}$ $(1\leqslant m\leqslant M)$ uses a change detector $\mathfrak{D}_{m}$ for estimating error rates. When the change detector $\mathfrak{D}_{m}$ triggers a warning signal, an alternative classifier $\mathfrak{A}_{m}$ is created and trained. If the warning signal is followed by an out-of-control signal, the main classifier $\mathfrak{C}_{m}$ is replaced by the alternative one ( $\mathfrak{A}_{m}$ ).

Figure 1.

Change detection mechanism used in each base classifier.

OACE handles concept drift by using two main modules [19]: the change detector module and the learning module (Fig. 1). In the OACE algorithm, each base classifier estimates classification error rates by a predictive sequential approach (test-then-train) [24, 21]. At the arrival of each training instance, the classification model makes a prediction based on the instance attribute values. Next, this instance is made available to the learning module that induces the base classifier. Gama et al. [24] gave mathematical guarantees to the predictive sequential method as a general framework for error estimates in incremental learning domains.

The error rate of the base classifiers is constantly monitored over time as each training instance arrives. Therefore, this monitoring must also be performed with controlled computational resources. Various parametric control charts have been proposed in the statistical community to detect changes online [34]. However, they assume that the input data are regulated by a known probability distribution.

Other change detection methods have been proposed, such as Adaptive Windows (ADWIN) [3], Drift Detection Method (DDM) [21], Early Drift Detection Method (EDDM) [2], and EWMA for Concept Drift Detection (ECDD) [38]. The adaptation mechanism of OACE is compatible with the scheme showed in Fig. 1. Thus, it can be used with all the aforementioned change detectors. Another example is the Hoeffding-based Drift Detection Method (HDDM ${}_{\textit{A-test}}$ ) [16, 18]. HDDM ${}_{\textit{A-test}}$ processes each incoming value with a constant computational complexity, and provides mathematical guarantees for the false positive and false negative rates. HDDM ${}_{\textit{A-test}}$ automatically adjusts the window size to deal with non-stationary distributions. It expands the window that contains the incoming values when a stable distribution is estimated and shrinks it when the data change.

5. Empirical study

This section experimentally compared OACE with various state-of-the-art ensemble algorithms for mining concept-drifting data streams.1

¹
The source code of OACE and additional experiment results are available online at https://github.com/averdeciac/Algorithms.

The experiments measured the generalization power of the contending algorithms by means of predictive accuracy [26, 21, 4]. Performance measures related to the computational cost were not considered, as all the contending algorithms are able to learn with constant time and space computational complexity per instance processed.

All the experiments were implemented and performed using the Massive Online Analysis (MOA) software [4]. MOA includes a collection of algorithms for processing data streams, various methods to generate artificial data streams with the possibility of including concept drifts, and several tools to evaluate concept drift detection algorithms.

The algorithms under consideration were evaluated by a test-then-train approach. Test-then-train was used in combination with a sliding window, as a forgetting mechanism [4]. Test-then-train computes the predictive performance of a learning model as each training instance arrives (test step). In the next step, the instance is presented to the learning algorithm for learning (train step) [27]. This methodology is based on the cumulative sum of the values of a given function. Without a forgetting mechanism, the error computed by a test-then-train approach may not reflect the current performance of the learning algorithm, because the current performance measures are averaged in the whole history. In order to overcome this drawback, we calculated metrics by means of a sliding window considering only the last instances [23]. Therefore, at each new instance, the classifier was first tested and then trained. During the learning process, predictive accuracy was calculated with respect to a sliding window of size 100 [4]. Predictive accuracy was calculated every 100 instances processed by means of the fraction between the number of correctly classified instances and the window’s size.

5.1 Datasets

The experiments considered both artificial (LED, SEA, RBF, WAV, AGR, STA, HYP) and real-world datasets (see Table 1). The artificial datasets allowed to evaluate the methods under stable concepts, abrupt and gradual types of change. The experiments also employed datasets with noise, irrelevant attributes, and missing attributes values (see Table 1). The artificial datasets were generated by the MOA software [4].

In the artificial datasets, the target concept changed 10 times. Changes occurred every 25,000 instances. In gradual changes, the transition period between consecutive concepts was set to 5,000 training instances. During the transition period, the probability that a new training instance belongs to the new concept was increased gradually and continuously. In addition, five new datasets were generated from the artificial datasets described in Table 1. The new datasets were obtained using the following definition:

.

Given two data streams $a$ , $b$ , we define $c=a\oplus_{t_{0}}^{W}b$ as the data stream built by joining the two data streams $a$ and $b$ , where $t_{0}$ is the point of change, $W$ is the length of change [4],

$Pr\left(c(t)=b(t)\right)=1/\left(1+e^{-4\left(t-t_{0}\right)/W}\right)$

and

$Pr\left(c(t)=a(t)\right)=1-Pr\left(c(t)=b(t)\right)$

In accordance with this definition, the new datasets were $AGR\oplus_{t_{0}}^{W}HYP$ (DS1), $AGR\oplus_{t_{0}}^{W}LED$ (DS2), $SEA\oplus_{t_{0}}^{W}RBF$ (DS3), $STA\oplus_{t_{0}}^{W}LED$ (DS4), $WAVE\oplus_{t_{0}}^{W}LED$ (DS5). Therefore, for abrupt changes, $t_{0}=$ 25,000 and $W=$ 0. For gradual changes, $t_{0}=$ 25,000 and $W=$ 5,000.

Table 1
Main characteristics of the datasets used in the experiments

Dataset	Acronym	Instances	Nominal	Numeric	Missing values	Classes
LED display (10% of noise)	LED	1,000,000	24	0	no	10
SEA (10% of noise)	SEA	1,000,000	0	3	no	2
Radial base functions	RBF	1,000,000	0	10	no	2
Waveform	WAV	1,000,000	0	40	no	3
Agrawal	AGR	1,000,000	3	6	no	2
Stagger	STA	1,000,000	3	0	no	2
Hyperplane (5% of noise)	HYP	1,000,000	0	10	no	2
Usenet 1	USE1	1,500	100	0	no	2
Usenet 2	USE2	1,500	100	0	no	2
Cars	CAR	1,728	6	0	no	4
Segment	SEG	2,310	0	19	no	7
Mushroom	MUS	8,124	22	0	yes	2
Spam	SPA1	4,601	1	57	no	2
Spam corpus 2	SPA2	9,323	500	0	no	2
Nursery	NUR	12,960	8	0	no	5
EEG eye state	EYE	14,980	0	14	no	2
Weather	WEA	18,159	0	8	no	2
Bank marketing	BAN	41,188	9	7	no	2
Electricity	ELE	45,312	1	7	yes	2
Connect-4	CON	67,557	21	0	no	3
KDD cup 10%	KDD	494,021	7	34	no	2
Forest cover	COV	581,012	44	10	no	7
Poker hand	POK	1,000,000	10	0	no	10

Table 2

Configuration of the change detector used in the experiments (size of the statistical test)

Change detector	Warning level	Drift level
HDDM ${}_{\textit{{A-test}}}$	$\alpha=$ 0.005	$\alpha=$ 0.001
ADWIN	$\alpha=$ 0.005	$\alpha=$ 0.001
DDM	$\alpha=$ 0.05	$\alpha=$ 0.01
EDDM	$\alpha=$ 0.05	$\alpha=$ 0.01

Table 3

Predictive performance of the algorithms based on bagging over abrupt and gradual changes

Algorithm	OACE $+$ Bag $+$ DDM		LB		OBAA		OBA
stats	$\bar{x}$	$s$	$\bar{x}$	$s$	$\bar{x}$	$s$	$\bar{x}$	$s$
Abrupt changes
LED	73.43	03.19	70.16	8.86	71.34	06.46	46.87	20.45
AGR	83.85	12.34	82.98	13.29	80.57	17.28	63.38	15.64
HYP	92.90	02.90	85.45	13.87	85.41	15.59	65.77	23.49
RBF	72.00	01.50	72.00	01.48	71.98	01.50	71.95	01.51
SEA	87.88	01.67	87.69	01.56	87.04	02.03	84.42	03.65
STA	99.96	00.20	87.10	19.18	87.28	18.98	65.78	20.05
WAV	80.50	01.30	80.49	01.28	80.48	01.30	80.48	01.30
DS1	91.87	03.41	61.36	15.45	65.49	13.29	60.07	15.99
DS2	77.15	06.16	77.26	06.30	70.65	09.84	67.73	11.50
DS3	76.21	07.00	70.19	11.27	70.27	11.26	67.83	12.41
DS4	80.39	11.32	80.39	11.39	76.63	13.63	71.58	16.43
DS5	94.86	08.72	86.23	04.65	86.01	05.64	85.89	06.03
Gradual changes
LED	71.00	06.87	71.10	06.69	71.20	06.47	46.75	17.64
AGR	81.53	12.77	81.74	12.68	81.73	12.81	63.51	14.79
HYP	90.22	06.16	90.18	06.30	90.14	06.37	64.69	21.19
RBF	72.04	01.51	71.99	01.51	71.99	01.51	71.97	01.52
SEA	87.52	01.88	86.86	02.05	86.72	02.06	84.38	03.37
STA	96.11	08.40	96.05	08.80	95.92	09.08	66.11	19.20
WAV	80.48	01.31	80.47	01.29	80.48	01.31	80.48	01.31
DS1	89.24	05.71	90.45	05.61	90.19	06.05	63.35	13.46
DS2	74.12	11.44	73.85	10.88	73.95	12.13	62.77	15.88
DS3	75.55	06.14	75.23	06.11	75.23	06.10	70.16	08.43
DS4	77.48	11.84	73.57	17.53	76.83	11.73	63.17	21.67
DS5	92.15	12.84	86.23	04.65	86.01	05.64	85.89	06.03

Figure 2.

Comparison of OACE variants with the Friedman test and the Bergman Hommel’s procedure for the post hoc analysis. Groups of classifiers that are not significantly different (at $p=$ 0.05) are connected.

5.2 Algorithm setup

The proposed algorithm used the online bagging (OACE $+$ Bag) and boosting (OACE $+$ Boost) algorithms for training the base classifiers [36]. First, these two variants of OACE were evaluated in combination with various online change detectors: HDDM ${}_{\textit{A-test}}$ , ADWIN [3], DDM [21] and EDDM [2].

Table 4
Predictive performance of the algorithms based on boosting over abrupt and gradual changes

Algorithm	OACE $+$ Boost $+$ HDDM		OBOA		OBO
stats	$\bar{x}$	$s$	$\bar{x}$	$s$	$\bar{x}$	$s$
Abrupt changes
LED	73.42	03.01	58.95	18.61	34.25	25.23
AGR	87.08	11.01	85.62	12.09	71.74	18.14
HYP	93.69	01.88	89.90	04.08	74.76	31.14
RBF	72.78	01.57	71.39	02.15	72.81	01.52
SEA	87.77	01.67	83.62	03.67	84.73	03.50
STA	99.97	00.18	93.84	19.00	72.65	20.34
WAV	81.51	01.27	81.51	01.27	81.51	01.27
DS1	92.01	02.48	86.35	04.59	87.45	05.65
DS2	78.63	08.44	68.89	13.36	69.30	14.24
DS3	76.92	06.46	75.18	05.89	74.51	09.51
DS4	78.07	13.09	79.45	12.09	74.79	14.77
DS5	95.37	07.87	95.33	07.93	93.30	9.45
Gradual changes
LED	70.27	07.76	54.26	15.42	28.63	23.05
AGR	84.72	12.73	81.08	14.24	72.45	16.91
HYP	91.33	06.16	87.01	07.74	77.81	23.89
RBF	72.99	01.58	71.61	02.70	73.03	01.54
SEA	87.42	01.95	81.44	3.93	84.79	03.22
STA	94.60	11.28	91.73	14.63	71.61	16.26
WAV	81.36	01.28	81.53	01.29	81.53	01.29
DS1	89.71	05.52	84.68	07.47	87.10	07.04
DS2	75.52	10.92	61.44	19.64	63.99	17.00
DS3	76.00	05.82	72.96	05.58	74.53	6.99
DS4	77.72	12.21	74.45	17.34	64.21	18.59
DS5	92.55	12.24	92.99	11.18	85.28	23.45

Table 5

Predictive performance of OACE $+$ Boost $+$ HDDM and the ensembles without change detector over abrupt and gradual changes

Algorithm	OACE $+$ Boost $+$ HDDM		AUE		AWE		DWM
stats	$\bar{x}$	$s$	$\bar{x}$	$s$	$\bar{x}$	$s$	$\bar{x}$	$s$
Abrupt changes
LED	73.42	03.01	70.96	10.31	72.27	06.87	42.63	06.65
AGR	87.08	11.01	82.69	13.88	83.50	12.65	77.43	15.22
HYP	93.69	01.88	91.89	07.09	92.02	06.34	91.57	04.35
RBF	72.78	01.57	72.05	01.90	72.55	01.98	66.51	02.38
SEA	87.77	01.67	87.43	03.37	87.14	03.47	83.60	02.39
STA	99.97	00.18	97.83	08.40	98.88	05.21	99.81	01.08
WAV	81.51	01.27	80.42	02.83	81.32	02.88	77.30	02.01
DS1	92.01	02.48	90.62	06.73	91.54	03.81	82.80	04.46
DS2	78.63	08.44	76.65	7.31	77.05	6.98	52.19	16.79
DS3	76.92	06.46	75.46	08.16	75.95	07.58	71.12	07.47
DS4	78.07	13.09	79.88	11.90	80.08	11.56	57.69	25.08
DS5	95.37	07.87	93.84	11.17	94.18	11.00	94.15	10.00
Gradual changes
LED	70.27	07.76	70.14	09.00	69.69	09.51	40.13	08.79
AGR	84.72	12.73	81.50	13.17	81.89	12.77	75.60	15.38
HYP	91.33	06.16	89.68	07.74	89.90	06.71	87.92	07.56
RBF	72.99	01.58	72.07	01.90	72.39	01.94	66.54	02.30
SEA	87.42	01.95	87.39	03.33	86.99	03.42	03.49	02.43
STA	94.60	11.28	95.56	09.82	96.07	08.74	96.05	08.83
WAV	81.36	01.28	80.42	02.78	81.31	02.82	77.49	02.08
DS1	89.71	05.52	89.81	06.14	89.21	07.09	80.67	07.62
DS2	75.52	10.92	72.02	15.03	73.88	10.83	48.40	17.24
DS3	76.00	05.82	74.72	07.07	74.68	06.98	69.57	07.05
DS4	77.72	12.21	75.52	14.76	74.56	15.61	52.59	24.42
DS5	92.55	12.24	91.11	15.22	91.60	14.03	90.79	14.60

Figure 2 shows the ranking position of the resulting algorithms with respect to predictive accuracy. The contending algorithms ran over all the datasets and types of change described in Section 5.1. The comparison was performed with the Friedman test and the Bergman Hommel’s procedure for the post hoc analysis. Groups of classifiers that are not significantly different (at $p=$ 0.05) are connected. This figure shows that, in general, the variants of OACE using online boosting ranked better than the variants using online bagging. It can be observed that OACE $+$ Bag $+$ DDM obtained the best ranking position with respect to the algorithms based on online bagging; whilst OACE $+$ Boost $+$ HDDM obtained the overall best ranking position. However, no significant differences were found between the variants involving HDDM and DDM. For the sake of clarity, the next sections only show the experiment results for OACE $+$ Bag $+$ DDM and OACE $+$ Boost $+$ HDDM.

OACE $+$ Bag $+$ DDM and OACE $+$ Boost $+$ HDDM were compared with three groups of ensemble algorithms:

•

Ensemble algorithms based on online bagging: OACE+Bag+DDM, LeveragingBag (LB) [5], OzaBagAdwin (OBAA) [6], and OzaBag (OBA) [36].

Table 6

Predictive performance of the algorithms based on bagging over real-world datasets. OACE $+$ Bag in combination with DDM detector

Algorithm	OACE $+$ Bag $+$ DDM		LB		OBAA		OBA
stats	$\bar{x}$	$s$	$\bar{x}$	$s$	$\bar{x}$	$s$	$\bar{x}$	$s$
BAN	90.23	11.02	89.94	11.17	89.79	11.39	89.15	12.58
CAR	83.06	10.31	82.56	10.42	80.67	10.78	80.56	11.71
CON	75.50	13.00	75.07	13.72	74.87	13.99	69.17	17.33
COV	88.19	07.51	83.20	11.92	83.07	11.96	60.55	21.76
ELE	85.35	06.73	78.84	11.97	78.91	12.13	74.26	14.63
EYE	99.47	01.48	90.61	17.88	90.91	18.26	47.31	46.41
KDD	99.90	00.59	99.62	03.14	99.66	02.88	97.88	11.01
MUS	98.35	02.81	99.28	01.45	98.65	02.17	98.41	02.29
NUR	92.62	06.71	91.19	08.13	90.30	09.27	84.11	14.18
POK	76.24	10.12	73.08	12.58	73.48	12.35	59.55	21.95
SPA1	91.83	06.60	91.67	10.08	90.53	10.84	90.44	10.97
SPA2	98.72	03.94	97.64	05.75	98.00	05.19	83.55	14.46
SEG	77.88	07.18	77.83	06.88	77.58	07.19	77.58	07.19
USE1	75.20	11.76	65.67	20.78	64.07	20.50	62.87	23.92
USE2	72.87	12.81	73.27	10.40	72.33	11.38	71.93	11.43
WEA	72.64	10.31	71.20	11.73	72.26	11.06	69.95	11.90

•

Ensemble algorithms based on online boosting: OACE+Boost+HDDM, OzaBoostAdwin (OBOA) [6], and OzaBoost (OBO) [36].

•

Ensemble algorithms that do not deal with concept drift explicitly: Accuracy Weighted Ensemble (AWE) [41], Accuracy Updated Ensemble (AUE) [8] and Dynamic Weighted Majority (DWM) [28].

Finally, in order to assess its adaptation mechanism, OACE was compared with other algorithms compatible with the ADWIN change detector: OBAA, OBOA, and LB.

Table 7

Predictive performance of the algorithms based on boosting over real-world datasets. OACE $+$ Boost in combination with HDDM detector

Algorithm	OACE $+$ Boost $+$ HDDM		OBOA		OBO
stats	$\bar{x}$	$s$	$\bar{x}$	$s$	$\bar{x}$	$s$
BAN	89.23	11.97	88.37	12.62	88.81	13.11
CAR	90.39	05.99	89.17	08.55	89.94	06.36
CON	77.86	11.55	74.14	12.99	72.26	15.46
COV	90.68	06.46	85.78	11.15	62.08	21.82
ELE	90.10	04.80	87.15	07.62	76.95	14.62
EYE	99.27	01.88	69.10	41.84	80.71	27.45
KDD	99.90	00.59	87.55	32.71	99.45	04.35
MUS	99.71	00.90	98.90	04.47	99.68	00.91
NUR	94.45	04.03	93.78	04.30	93.87	04.42
POK	79.50	08.16	75.21	9.85	59.59	21.91
SPA1	96.93	03.46	95.44	05.49	92.97	11.53
SPA2	98.96	03.80	98.68	04.45	95.72	05.56
SEG	82.96	07.69	82.92	07.52	82.92	07.52
USE1	77.80	11.09	69.93	15.57	69.87	17.34
USE2	75.40	10.91	75.80	10.37	75.40	09.09
WEA	75.28	08.38	73.96	08.80	73.58	09.47

Table 8

Predictive performance of the algorithms based on boosting over real-world datasets. OACE $+$ Boost in combination with HDDM detector

Algorithm	OACE $+$ Boost $+$ HDDM		AUE		AWE		DWM
stats	$\bar{x}$	$s$	$\bar{x}$	$s$	$\bar{x}$	$s$	$\bar{x}$	$s$
BAN	89.23	11.97	88.30	14.85	85.35	19.10	87.92	12.62
CAR	90.39	05.99	77.50	13.45	77.00	14.04	81.44	09.89
CON	77.86	11.55	69.09	18.15	64.33	19.62	72.86	13.78
COV	90.68	06.46	80.09	14.72	80.15	15.17	85.19	10.28
ELE	90.10	04.80	74.96	15.40	71.94	16.79	83.03	08.43
EYE	99.27	01.88	57.15	42.02	57.78	41.35	94.91	11.44
KDD	99.90	00.59	21.48	40.38	21.25	40.26	99.70	02.11
MUS	99.71	00.90	97.66	03.30	96.98	04.48	98.74	02.35
NUR	94.45	04.03	80.52	19.54	79.24	17.59	91.76	07.20
POK	79.50	08.16	62.39	21.27	58.20	22.20	75.85	09.56
SPA1	96.93	03.46	74.96	32.50	74.54	32.32	92.27	08.14
SPA2	98.96	03.80	58.79	46.22	59.72	47.32	98.00	04.46
SEG	82.96	07.69	64.29	26.14	64.63	26.28	68.46	12.26
USE1	77.80	11.09	61.2	22.82	60.60	22.29	60.53	22.44
USE2	75.40	10.91	64.33	06.12	61.67	08.59	72.40	11.10
WEA	75.28	08.38	69.53	12.57	70.54	11.78	72.25	09.25

All the ensemble algorithms and change detectors were set with the default configuration adopted by MOA [4]. Table 2 shows the parameter values of the statistical tests used by the change detectors. The ensemble size was set to 10 for all the considered ensemble methods. All the ensemble algorithms used Naive Bayes as a base classifier. Naive Bayes was chosen because it is one of the most successful algorithms for learning from data streams [10, 9, 31, 37, 13]: it has a low computational cost, is simple, has a clear semantics, and works well with continuous attributes and missing attribute values. Additional experiments performed in this study revealed that, for the Naive Bayes classifier, an increase in the ensemble size does not increase predictive accuracy significantly.

Figure 3.

Predictive accuracy of the contending algorithms over various data stream generators.

Figure 4.

Comparison of all classifiers against each other with the Friedman test and the Bergman Hommel’s procedure for the post hoc analysis. Groups of classifiers that are not significantly different (at $p=$ 0.05) are connected. All learning algorithms used Naïve Bayes as base classifier.

5.3 Experiments with artificial datasets

As mentioned, OACE $+$ Bag $+$ DDM and OACE $+$ Boost $+$ HDDM were compared with three groups of ensemble algorithms (see Section 5.2). The experiment results of the algorithms based on bagging and boosting were shown in different tables, in order to evaluate the predictive performance of the adaptation mechanism used by OACE. Then, OACE $+$ Boost $+$ HDDM (which often achieved the highest levels of predictive accuracy) was compared with the algorithms of the third group (AWE, AUE and DWM).

Tables 3–5 summarize the predictive performance of the algorithms over abrupt and gradual changes in terms of the average ( $\bar{x}$ ) and standard deviation ( $s$ ). The highest levels of predictive accuracy are in bold. The experiment results of the algorithms based on online bagging and boosting are shown in Tables 3 and 4 respectively. Tables 3 and 4 show that OACE $+$ Bag $+$ DDM and OACE $+$ Boost $+$ HDDM often outperformed the rest of the algorithms regarding predictive accuracy.

Table 5 compares the performance of OACE $+$ Boost $+$ HDDM against the ensemble algorithms without a mechanism to deal with concept drift explicitly. Table 5 shows that the adaptation mechanism used by OACE $+$ Boost $+$ HDDM is more effective. OACE also obtained high levels of predictive accuracy in the artificial noisy datasets (LED, HYP and SEA).

Figure 3 illustrates the predictive performance of the algorithms for the LED, HYP, and STA datasets. According to Fig. 3, OACE is able to stabilize the learning when concepts are stable. Figure 3 also shows that OACE often adapts the learning more effectively than the contending algorithms.

Figure 4 shows the ranking positions of the algorithms with respect to Tables 3–5. To verify significant differences, we used the Friedman test and the Bergman Hommel’s procedure for the post hoc analysis. Groups of classifiers that are not significantly different (at $p=$ 0.05) are connected. This figure shows that, in general, OACE ranked better than the contending algorithms.

5.4 Experiments with real-world datasets

Figure 5.

Predictive accuracy of the contending algorithms over various real world data.

Similar to Section 5.3, this section compares OACE $+$ Bag $+$ DDM and OACE $+$ Boost $+$ HDDM with the three groups of ensemble algorithms. Tables 6–8 summarize the predictive performance of the algorithms over the real world datasets described in Table 1. These tables show that, in the real world datasets, OACE often obtained the highest levels of predictive accuracy.

Figure 5 reflects that in the real-world datasets, changes probably occur gradually and continuously over time. This figure shows that OACE also outperformed the competing algorithms regarding predictive accuracy in these situations.

We also verified significant differences between the contending algorithms by the Friedman test and the Bergman Hommel’s procedure for the post hoc analysis. Groups of classifiers that are not significantly different (at $p=$ 0.05) are connected. Figure 6 shows that OACE $+$ Bag $+$ DDM and OACE $+$ Boost $+$ HDDM ranked significantly better than the other algorithms.

Figure 6.

Table 9

Algorithms’ predictive performance in combination with ADWIN over abrupt and gradual changes

Algorithm	OACE $+$ Boost $+$ A		OACE $+$ Bag $+$ A		LB		OBAA		OBOA
stats	$\bar{x}$	$s$	$\bar{x}$	$s$	$\bar{x}$	$s$	$\bar{x}$	$s$	$\bar{x}$	$s$
Abrupt changes
LED	70.14	03.37	73.19	02.29	70.16	8.86	71.34	6.46	58.95	18.61
AGR	86.20	11.73	83. 95	12.22	82.98	13.29	80.57	17.28	85.62	12.09
HYP	92.96	02.63	92.89	02.73	85.45	13.87	85.41	15.59	89.90	04.08
RBF	72.68	01.66	71.96	01.51	72.00	01.48	71.98	01.50	71.39	02.15
SEA	87.46	01.52	87.84	01.47	87.69	1.56	87.04	02.03	83.62	03.67
STA	99.96	00.20	99.90	00.50	87.10	19.18	87.28	18.98	93.84	19.00
WAV	81.51	01.28	80.48	01.30	80.49	1.28	80.48	1.3	81.51	01.27
DS1	88.57	02.64	90.03	02.41	61.36	15.45	65.49	13.29	86.35	04.59
DS2	76.04	09.43	76.93	05.83	77.26	06.30	70.65	9.84	68.89	13.36
DS3	76.39	06.36	76.24	06.93	70.19	11.27	70.27	11.26	75.18	05.89
DS4	75.95	14.04	80.06	11.15	80.39	11.39	76.63	13.63	79.45	12.09
DS5	95.42	07.80	94.99	08.53	86.23	4.65	86.01	5.64	95.33	07.93
Gradual Changes
LED	67.34	06.77	69.63	07.60	71.10	06.69	71.20	06.47	54.26	15.42
AGR	83.35	13.01	81.38	12.90	81.74	12.68	81.73	12.81	81.08	14.24
HYP	90.50	06.64	89.95	06.24	90.18	06.30	90.14	06.37	87.01	07.74
RBF	72.89	01.65	72.01	01.52	71.99	01.51	71.99	01.51	71.61	02.70
SEA	87.03	01.85	87.24	01.90	86.86	02.05	86.72	02.06	81.44	03.93
STA	94.85	10.95	96.20	08.29	96.05	08.80	95.92	09.08	91.73	14.63
WAV	81.47	01.29	80.48	01.31	80.47	1.29	80.48	01.31	81.53	01.29
DS1	85.88	05.71	87.78	06.00	90.45	05.61	90.19	06.05	84.68	07.47
DS2	71.75	14.09	72.13	13.84	73.85	10.88	73.95	12.13	61.44	19.64
DS3	75.51	05.76	75.33	06.22	75.23	06.11	75.23	06.10	72.96	05.58
DS4	74.93	13.45	76.17	12.58	73.57	17.53	76.83	11.73	74.45	17.34
DS5	92.16	12.56	91.46	13.46	91.77	13.48	91.91	13.03	92.99	11.18

Table 10

Algorithms’ predictive performance in combination with ADWIN over real-world datasets

Algorithm	OACE $+$ Boost $+$ A		OACE $+$ Bag $+$ A		LB		OBAA		OBOA
stats	$\bar{x}$	$s$	$\bar{x}$	$s$	$\bar{x}$	$s$	$\bar{x}$	$s$	$\bar{x}$	$s$
BAN	88.50	12.62	89.81	11.36	89.94	11.17	89.79	11.39	88.37	12.62
CAR	90.61	05.57	81.11	09.92	82.56	10.42	80.67	10.78	89.17	08.55
CON	77.70	11.54	75.05	13.43	75.07	13.72	74.87	13.99	74.14	12.99
COV	90.12	06.98	84.23	10.42	83.20	11.92	83.07	11.96	85.78	11.15
ELE	91.01	04.31	81.44	09.74	78.84	11.97	78.91	12.13	87.15	07.62
EYE	99.03	02.45	97.16	06.40	90.61	17.88	90.91	18.26	69.10	41.84
KDD	99.90	00.64	99.82	01.34	99.62	03.14	99.66	02.88	87.55	32.71
MUS	99.68	00.91	98.78	02.09	99.28	01.45	98.65	02.17	98.90	04.47
NUR	94.77	04.55	91.48	6.99	91.19	08.13	90.30	09.27	93.78	04.30
POK	79.35	08.27	75.48	10.59	73.08	12.58	73.48	12.35	75.21	09.85
SPA1	96.61	04.35	91.80	07.11	91.67	10.08	90.53	10.84	95.44	05.49
SPA2	98.43	05.78	98.51	04.34	97.64	05.75	98.00	05.19	98.68	04.45
SEG	83.42	07.92	77.63	07.12	77.83	6.88	77.58	07.19	82.92	07.52
USE1	75.93	11.82	67.93	17.86	65.67	20.78	64.07	20.50	69.93	15.57
USE2	76.40	10.31	74.73	10.32	73.27	10.40	72.33	11.38	75.80	10.37
WEA	74.87	08.22	73.01	10.31	71.20	11.73	72.26	11.06	73.96	08.80

5.5 Comparison using the ADWIN change detector

In this section, we compared OACE in combination with the ADWIN change detector against three state-of-the-art ensemble algorithms: OzaBagAdwin (OBAA) [6], OzaBoostAdwin (OBOA) [6]and LeveragingBag (LB) [5]. All the contending algorithms used ADWIN as change detector to assess OACE’s adaptation mechanism. Tables 9 and 10 summarize the predictive performance of the algorithms over the datasets described in Table 1. Tables 9 and 10 show that OACE $+$ Boost in combination with ADWIN (OACE+Boost+A) often obtained the highest levels of predictive accuracy.

We also verified significant differences between the contending algorithms by the Friedman test and the Bergman Hommel’s procedure for the post hoc analysis. Figure 7 shows the ranking position of various ensemble algorithms when using ADWIN as a change detector. In this case, the algorithms ran over all the datasets according with Tables 9 and 10. This figure also shows that OACE is a good alternative for learning from concept-drifting data streams.

Figure 7.

Ranking positions of the resulting algorithms with respect to predictive accuracy in combination with ADWIN change detector over all datasets. Groups of classifiers that are not significantly different (at $p=$ 0.05) are connected. P-value $=$ 4.20e-07.

6. Conclusions

This paper presented a new learning algorithm able to learn from non-stationary data streams. The proposed algorithm, named Online Adaptive Classifier Ensemble (OACE), processes the input data with constant time and space computational complexity, and learns with a simple scan over the training data. To deal with concept drift, OACE uses a change detector to monitor the performance of each base classifier. OACE’s base classifiers alternate between three different drift stages in the adaptation process: in-control, warning, and out-of-control.

A base classifier is in-control when the corresponding change detector estimates that the current concept is stable. The warning stage is reached when a concept drift is likely to be approaching. At the warning stage, an alternative classifier is trained together with the corresponding base classifier. When a concept drift is detected the corresponding base classifier reaches the out-of-control stage, and the base classifier is replaced by the alternative one.

OACE was empirically compared with various state-of-the-art ensemble algorithms for mining data streams. For this, Naive Bayes was used as a base classifier. The experiments included both artificial and real-world benchmark datasets. The contending algorithms were tested under common types of change (abrupt and gradual), different levels of noise, irrelevant attributes, and missing attribute values. The experiment results showed that OACE is a good option for learning from concept-drifting data streams.

We plan to continue with this research by using other learning algorithms as base classifiers in OACE, such as Hoeffding trees and Perceptron. We also plan to study situations in with previous concepts can reappear over time.

Footnotes

Acknowledgments

The authors wish to thank the editor and the anonymous reviewers for their useful comments and suggestions. This work was partially supported by FAPESP, grant number 2015/03355-0; and by CeMEAI-FAPESP, grant number 13/07375-0.

References

Baena-Garcia

Campo-Avila

J.D.

Fidalgo

Bifet

Gavalda

, and Morales-Bueno

, Early drift detection method. 2006.

Bifet

Frank

Holmes

and Pfahringer

, Accurate Ensembles for Data Streams: Combining Restricted Hoeffding Trees using Stacking, In ACML, 2010, pp. 225–240.

Bifet

and Gavalda

, Learning from time-changing data with adaptive windowing, In SIAM International Conference on Data Mining, 2007.

Bifet

Holmes

Kirkby

and Pfahringer

, Moa: Massive online analysis, The Journal of Machine Learning Research 11 (2010), 1601–1604.

Bifet

Holmes

and Pfahringer

, Leveraging bagging for evolving data streams, In Machine learning and knowledge discovery in databases, 2010, pp. 135–150. Springer.

Bifet

Holmes

Pfahringer

Kirkby

and Gavalda

, New ensemble methods for evolving data streams, In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, 2009, pp. 139–148, ACM.

Breiman

, Bagging predictors, Machine Learning 24(2) (1996), 123–140.

Brzezinski

and Stefanowski

, Reacting to Different Types of Concept Drift: The Accuracy Updated Ensemble Algorithm, IEEE Transactions on Neural Networks and Learning Systems 25(1) (Jan. 2014), 81–94.

Cestnik

, Estimating probabilities: a crucial task in machine learning, In ECAI, volume 90, 1990, pp. 147–149.

10.

Clark

and Niblett

, The CN2 induction algorithm, Machine Learning 3(4) (1989), 261–283.

11.

Deckert

, Batch weighted ensemble for mining data streams with concept drift, In Foundations of Intelligent Systems, 2011, pp. 290–299. Springer.

12.

Del Campo Ávila

, Nuevos enfoques en aprendizaje incremental, 2007.

13.

Domingos

and Pazzani

, On the optimality of the simple Bayesian classifier under zero-one loss, Machine Learning 29(2–3) (1997), 103–130.

14.

Ferrer Troyano

F.J.

Aguilar-Ruiz

J.S.

and Santos

J.C.R.

, Incremental Rule Learning and Border Examples Selection from Numerical Data Streams, J. UCS 11(8) (2005), 1426–1439.

15.

Freund

and Schapire

R.E.

, A Short Introduction to Boosting, In In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, 1999, pp. 1401–1406. Morgan Kaufmann,

16.

Frias-Blanco

Campo-Avila

J.D.

Ramos-Jimenez

Morales-Bueno

Ortiz-Diaz

and Caballero-Mota

, Online and Non-Parametric Drift Detection Methods Based on Hoeffding Bounds, IEEE Transactions on Knowledge and Data Engineering 27(3) (Mar. 2015), 810–823.

17.

Frías-Blanco

del Campo-Ávila

Ramos-Jiménez

Carvalho

A.C.

Ortiz-Díaz

and Morales-Bueno

, Online adaptive decision trees based on concentration inequalities, Knowledge-Based Systems 104 (2016), 179–194.

18.

Frías Blanco

del Campo Ávila

Ramos Jiménez

Morales Bueno

Ortiz Díaz

and Caballero Mota

, Aprendiendo con detección de cambio online, Computación y Sistemas 18(1) (2014), 169–183.

19.

Frías-Blanco

Verdecia-Cabrera

Ortiz-Díaz

and Carvalho

, Fast adaptive stacking of ensembles, In Proceedings of the 31st Annual ACM Symposium on Applied Computing, 2016, pp. 929–934, ACM.

20.

Gama

and Kosina

, Tracking recurring concepts with meta-learners, In Progress in Artificial Intelligence, 2009, pp. 423–434. Springer.,

21.

Gama

Medas

Castillo

and Rodrigues

, Learning with drift detection, In Advances in artificial intelligence, 2004, pp. 286–295. Springer.

22.

Gama

and Rodrigues

, Learning from Data Streams: Processing Techniques in Sensor Networks, Springer-Verlag, 1 edition, 2007.

23.

Gama

Rodrigues

P.P.

and Sebasti ao

, Evaluating algorithms that learn from data streams, In Proceedings of the 2009 ACM symposium on Applied Computing, 2009, pp. 1496–1500, ACM.

24.

Gama

Sebasti ao

and Rodrigues

, On evaluating stream learning algorithms, Machine Learning 90(3) (2013), 317–346.

25.

Gama

Zliobaite

Bifet

Pechenizkiy

and Bouchachia

, A Survey on Concept Drift Adaptation, ACM Comput. Surv. 46(4) (Mar. 2014), 44:1–44:37.

26.

Harries

Nsw-csetr

and Wales

N.S.

, SPLICE-2 Comparative Evaluation: Electricity Pricing, Technical report, 1999.

27.

Hulten

Spencer

and Domingos

, Mining time-changing data streams, In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, 2001, pp. 97–106, ACM.

28.

Kolter

J.Z.

and Maloof

M.A.

, Dynamic weighted majority: An ensemble method for drifting concepts, The Journal of Machine Learning Research 8 (2007), 2755–2790.

29.

Kuncheva

L.I.

, Combining pattern classifiers: methods and algorithms, John Wiley & Sons, 2004.

30.

Lam

and Suen

, Application of majority voting to pattern recognition: an analysis of its behavior and performance, IEEE Transactions on Systems, Man, and Cybernetics – Part A: Systems and Humans 27(5) (1997), 553–568.

31.

Langley

Iba

and Thompson

, An analysis of Bayesian classifiers, In Aaai, volume 90, 1992, pp. 223–228.

32.

Minku

L.L.

White

A.P.

and Yao

, The impact of diversity on online ensemble learning in the presence of concept drift, Knowledge and Data Engineering, IEEE Transactions on 22(5) (2010), 730–742.

33.

Minku

L.L.

and Yao

, DDD: A new ensemble approach for dealing with concept drift, Knowledge and Data Engineering, IEEE Transactions on 24(4) (2012), 619–633.

34.

Montgomery

D.C.

, Introduction to statistical quality control, John Wiley & Sons, 2007.

35.

Ortiz Diaz

del Campo-Avila

Ramos-Jimenez

Frias Blanco

Caballero Mota

, A. Mustelier Hechavarria and R. Morales-Bueno, Fast Adapting Ensemble: A New Algorithm for Mining Data Streams with Concept Drift, The Scientific World Journal, 2014.

36.

Oza

N.C.

and Russell

, Online Bagging and Boosting, in: Jaakkola

and Richardson

, editors, Eighth International Workshop on Artificial Intelligence and Statistics, pp. 105–112, Key West, Florida. USA, (Jan. 2001). Morgan Kaufmann.

37.

Pazzani

M.J.

, Searching for dependencies in Bayesian classifiers, In Learning from Data, 1996, pp. 239–248. Springer.

38.

Ross

G.J.

Adams

N.M.

Tasoulis

D.K.

and Hand

D.J.

, Exponentially weighted moving average charts for detecting concept drift, Pattern Recognition Letters 33(2) (2012), 191–198.

39.

Stanley

K.O.

, Learning concept drift with a committee of decision trees, Informe tecnico: UT-AI-TR-03-302, Department of Computer Sciences, University of Texas at Austin, USA, 2003.

40.

Street

W.N.

and Kim

, A Streaming Ensemble Algorithm (SEA) for Large-scale Classification, In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’01, 2001, pp. 377–382, New York, NY, USA, ACM.

41.

Wang

Fan

P.S.

and Han

, Mining concept-drifting data streams using ensemble classifiers, In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, 2003, pp. 226–235, ACM.

42.

Widmer

and Kubat

, Learning in the presence of concept drift and hidden contexts, Machine Learning 23(1) (1996), 69–101.

43.

Yue

Guojun

and Chunnian

, Mining concept drifts from data streams based on multi-classifiers, In Advanced Information Networking and Applications Workshops, 2007, AINAW’07. 21st International Conference on, volume 2, 2007, pp. 257–263. IEEE.

44.

Zliobaite

, Learning under concept drift: an overview, Technical report, Overview, Technical report, Vilnius University, 2009 techniques, related areas, applications Subjects: Artificial Intelligence, 2009.

An online adaptive classifier ensemble for mining non-stationary data streams

Abstract

Keywords

1. Introduction

2. Definitions

4. The new ensemble algorithm

1 The source code of OACE and additional experiment results are available online at https://github.com/averdeciac/Algorithms.

.

Table 1 Main characteristics of the datasets used in the experiments

Table 4 Predictive performance of the algorithms based on boosting over abrupt and gradual changes

5.4 Experiments with real-world datasets

Footnotes

Acknowledgments

References

¹
The source code of OACE and additional experiment results are available online at https://github.com/averdeciac/Algorithms.

Table 1
Main characteristics of the datasets used in the experiments

Table 4
Predictive performance of the algorithms based on boosting over abrupt and gradual changes