A novel Bagged Naïve Bayes-Decision Tree approach for multi-class classification problems

Abstract

Breakthrough classification performances have been achieved by utilizing ensemble techniques in machine learning and data mining. Bagging is one such ensemble technique that has outperformed single models in obtaining higher predictive performances. This paper proposes an ensemble technique by utilizing the basic bootstrap aggregating technique on hybridization of two base learners namely Naïve Bayes (NB) and Decision Tree (DT). Before induction of the DT, NB algorithm is employed for eliminating mislabeled or contradictory instances from the training set. Consequently, bagging approach is applied on hybrid NBDT as the base learner. The resultant Bagged Naïve Bayes-Decision Tree (BNBDT) algorithm is then used for improving the classification accuracy of various multi-class problems. This algorithm iteratively trains the base learner from random samples of the training set, and then performs majority voting of their predictions. The proposed algorithm is compared with both ensemble and single classification techniques such as Random Forest, Bagged NB, Bagged DT, NB, and DT. Experimental results over 52 UCI data sets with bag size 100 demonstrate that the proposed algorithm significantly outperforms the existing algorithms.

Keywords

Bagging naïve bayes decision tree classification multi-class problems machine learning hybrid learner

1 Introduction

In the area of machine learning and data mining, supervised classification mechanisms are considered important tools for decision making. Classification can be defined as a technique for grouping or categorizing the data instances into known classes. It has been applied to numerous applications like bioinformatics, fraud detection, banking, and other areas [1, 2]. As a kind of inductive learning algorithm, decision tree algorithms have been successful to build classifiers with the aim to maximize the classification accuracy. Among probabilistic induction techniques, Naïve Bayes has been widely used due to its competitive behavior with the best inductive learning methods and their inherent resilience to noise [3]. The ensemble classification approaches hybridize multiple hypotheses induced by different base learners to obtain better classification performance [4].

In the last few decades, one of the major issues which has received great attention in classification problems is how to obtain better classifications. This problem intensifies even more when the number of classes are increased. In multi-class scenario, it is assumed that class-labels are independent of one another and therefore most of the proposed techniques attempt to improve the performance of classifiers based on it [5]. Many real-world applications related to disease diagnosis, credit risk scoring, information retrieval and other predictive domains are associated with multi-class classification problems [6]. Such problems have drawn growing interest recently due to their classification difficulty caused by multiple classes. These problems arise in the data instances that belong to one specific class among more than two different classes. In particular, many ensemble and hybrid approaches have been proposed to deal with such multi-class issues. However, most efforts so far only focus on utilizing single learner technique for dealing with binary or multi class datasets. Moreover, various challenges are posed by the multi-class classification problems which need to be effectively investigated [7, 8].

During past years, numerous works have been performed to improve the performance of the learners [9]. The techniques proposed in the existing studies belong to one of the three categories: (1) designing better learning approaches, (2) applying some type of transformation on training dataset, and (3) modifying the predictions produced by the learner. The concept of ensemble learning which is composed of bagging and boosting multiple classifier systems falls into type (1). Most existing classification systems employ individual learners usually designed for solving binary-class problems. Such classification systems are less effective and are unable to deal with multi-class tasks. Although many solutions were proposed for multi-class classification problems, most attention in the literature was given to hybrid or ensemble strategies [10]. Among multi-classifier systems, several aspects on the fusion process of individual classifiers such as each classifier’s accuracy, diversity and independence need to be studied for robust construction of ensemble of hybrid classifiers. Therefore, it is desirable to develop an effective and efficient technique for handling the multi-class classification problems [11, 12].

In this paper, a novel bagged naïve-bayes decision tree methodology has been proposed for solving multi-class classification problems. Firstly, the algorithm utilizes NB classifier for searching troublesome and arduous instances from the training dataset and then removes these samples from the training dataset before construction of the DT classifier for obtaining decisions. The presence of noisy instances in the dataset may produce many potential negative consequences which can decrease the classification accuracy. In addition, the complexity of the classifier may increase due to a number of redundant training samples. Researchers utilize NB classifier for detection of noisy or contradictory instances by taking into consideration the independence assumption and class conditional probabilities of the features. With removal of noisy samples from the data, the quality of the data gets improved which results in building of more accurate and precise models. The real-world data contains various types of errors which are implicit and explicit in nature. Previous researches have shown that among various classifiers, NB is the most noise tolerant algorithm which attempts to reduce the noise levels and improves the quality of data [13, 14]. Due to this reason, the DT classifier might suffer from the problem of overfitting leading to reduced classification accuracy. Furthermore, for datasets with huge number of attributes, the calculation of class conditional independence by a naïve bayes classifier enhances the computational cost. The performance of the proposed bagged NB-DT classifier is evaluated against three ensemble approaches namely bagged NB, bagged DT, RF and two existing individual classifiers viz. DT and NB taking classification accuracy as the performance parameter. An extensive empirical experimental analysis was conducted using 10 runs of 10-fold CV on 52 benchmark datasets from University of California (UCI) machine learning database [15]. Additionally, we compare the proposed and existing algorithms using two-tailed student’s T-test having confidence level of 95%. On the basis of statistical methods, the difference between the proposed and existing algorithm is statistically significant only if the P-value between both the algorithms for a T-test is less than 0.05. The experimental outcomes reveal that the proposed method exhibits encouraging classification results in solving real-life multi-class problems. More specifically, the method proves to be more robust during the presence of noisy instances in the dataset, and therefore produces more effective and accurate classification results.

The remainder of the paper is organized as follows. In Section 2, related work and some background on bagging approach, Naïve Bayes and decision trees classification algorithms is provided. Section 3 addresses the preliminary concepts and descriptions about the classification techniques. Section 4 introduces our novel proposed bagged NB-DT framework, followed by the experimental analysis and discussions in Section 5. Finally, we conclude the paper in Section 6 with suggestions for future works.

2 Literature review and background

This section reviews the recent researches performed on decision tree, naïve bayes and bagging-based learning models in solving numerous challenging real-life classification problems. Such problems are usually categorized into binary and multi-class tasks. Decision tree classification methodology is the most widely used data mining technique which develops prediction systems using large number of covariates and a target variable. It can efficiently deal with complicated and huge size datasets, and are robust to outliers. The complexity of constructed decision tree can be controlled by employing the stopping criteria and pruning method which can improve the classification accuracy. De Caigny et al. [16] presented a hybrid classification algorithm which utilizes decision trees and logistic regression classifiers for customer churn prediction. The proposed logit leaf model (LLM) exploits the capabilities of both techniques where in DT can effectively deal with interaction effects among variables and LR can handle linear relations among attributes very efficiently. This technique enhances the predictive performance of the classifiers as well as interpretability of the results. Kotsiantis [17] proposed a decision tree technique by combining local application of naïve bayes classifier to increase the classification accuracy of the model. Further, the proposed algorithm was compared with the other state-of-the-art methods on 30 benchmark datasets for attaining the performance parameters. Carvalho and Freitas [18] proposed a hybrid-based decision tree/ genetic algorithm describing the data mining concept of small disjuncts. In this approach, two genetic algorithms (GAs) have been developed for discovery of rules belonging to small disjuncts as opposed to the DT approach that generates rules to classify instances belonging to large disjuncts.

Panhalkar and Doye [19] presented a literature survey about various hybridized methods for DTs based on integration of the basic DT with several approaches like AVL tree, clustering, genetic algorithm, Naïve Bayes and fuzzy techniques. Wang et al. [20] developed a novel self-adaptive NBTree methodology which creates a hybrid of DT and NB. The NB classifier that builds DT can deal with continuous variables and solves the overgeneralization as well as overspecialization problems that are mostly observed in DT. Polat and Gunes [21] presented a new hybrid classification algorithm for solving multi-class problems which are based on the combination of C4.5 DT learner and one-against-all technique. Classification accuracy, sensitivity and specificity results were obtained with 10-fold cross validation on UCI datasets. Lee et al. [22] proposed a novel bagging C4.5 approach that supports wrapper based feature selection for improving the diagnostic accuracy of clinical decision support systems. The proposed sampling method (S-C4.5-SMOTE) not only deals with the data distortion problem but also boosts the overall classification performance of system by maintaining a balanced dataset. Singh and Verma [23] proposed a multi-classifier model for predicting the faulty modules of software system. The proposed model which is an amalgamation of NB, SVM and RF provides better results than each of the individual classifiers, AIR1, AIR2 and proves to be efficient and robust in fault prediction of numerous software projects.

During the past few years, ensemble or multiple classifier systems (MCS) have proved their excellence in terms of accuracy and competency when compared with single classifier models. Ala’raj and Abbod [24] presented a novel combination technique based on consensus of heterogeneous classifiers for integrating MCS which possess diverse classification algorithms. The 5 major base classifiers used were neural networks, SVMs, RFs, DT and NB. The performance results of the proposed consensus approach were compared with two benchmark algorithms viz. logistic regression (LR) and multivariate adaptive regression splines (MARS) on 5 real-world credit scoring datasets. Also, the model was validated over 7 traditional combination methods using 4 different performance evaluation measures namely accuracy, AUC, H-measure and Brier score. Sun et al. [25] proposed a DT ensemble technique for enterprise and bank credit evaluation based on SMOTE (Synthetic Minority Over-sampling Technique) for class-imbalanced data and bagging ensemble method with DSR (Differentiated Sampling Rates), known as DTE-SBD (Decision Tree Ensemble based on SMOTE, Bagging and DSR). The model not only solves the problem of class imbalance but also enhances the diversity among base classifiers during ensemble modeling of DT. The experiments were conducted 100 times on financial data, and DTE-SBD significantly outperformed when compared with the 5 other models namely pure DT, bagging DT, SMOTE DT, over under-sampling DT and oversampling DT.

Another commonly used data mining tool is the NB classifier used for prediction and classification purposes due to its high computational efficiency, simplicity and better classification accuracy. This classifier is predominantly designed for huge size datasets. It is based on the assumption of strong conditional independence among attributes which in most cases decreases the prediction performance Wu et al. [26] proposed a novel self-adaptive variable weighing method based on the concept of Artificial Immune System (AIS) for classifying naïve bayes. The proposed method AISWNB accurately calculates the conditional probability and increases the assumption of conditional independence. Chandra and Gupta [27] presented a robust NB classification algorithm (R-NB) for overcoming its two main drawbacks viz. overfitting and underflow for prediction and classification of high-dimensional gene data. The underflow problem is solved by adding the logarithms of the probabilities instead of multiplying them, and the overfitting problem by using the estimate approach. The novel approach uses a robust function for determining the NB classifierprobabilities.

Among ensemble classification approaches random forests has proved its merit in solving several multi-class classification problems in the domain of machine learning. It can be regarded as a meta-estimator that fits a large number of DT classifiers on many subsamples of the dataset It uses averaging to control overfitting thus improving the classification accuracy. Wei et al. [28] developed a novel cascade random forest (CRF) algorithm to resolve the issue of data imbalance that occurs in prediction of Protein-Protein Interaction sites (PPIs). Lin et al. [29] proposed a random forest based ELM ensemble approach for multi-regime time series prediction. The regularized ELM combined with ensemble technique improves the prediction accuracy as well as the performance of generalization. Chaudhary [9] et al. presented an improved random forest classifier (RFC) for multi-class disease classification problem. The proposed approach is an integration of RF, attribute evaluation method and a filter method which shows best performance among the five benchmark datasets. Silva-Palacios et al. proposed two approaches to solve the multi-class classification difficulties by hierarchically decomposing the original problem and decreasing the number of classes of each local subproblem [30].

3 Supervised classification methods

Classification is one of the most frequently used data mining and machine learning mechanisms used for prediction and decision making. The following sub-sections discuss about the three state-of-the-art supervised learning techniques i.e. Naïve-Bayes, Decision Tree and Random Forest utilized for performing the comparative study.

3.1 Decision tree induction

A decision tree algorithm generates trees as well as rules for data classification. A common DT algorithm is C4.5 which is an improvement over the ID3 algorithm since it deals with continuous and discrete variables as well as missing values. The final output of DT is either a fully constructed tree from root to leaves or a ruleset that assigns class label to a new data sample. For a given training dataset D , the probability p _i of an arbitrary tuple to belong to class C _i is given by | C _{i
,

D}|/| D . The expected information or entropy needed to classify a tuple in D is given by $Info (D) = - \sum_{i = 1}^{m} p_{i} log 2 (p_{i})$ (1)

The information needed after using attributes A to split D into ν partitions is estimated by ${Info}_{A} (D) = - \sum_{j = 1}^{v} \frac{| D_{j} |}{D} \times Info (D_{j})$ (2)

The information gained by branching on attribute A is given in Equation (3) defined as the difference between the requirement of original and new information. $Gain (A) = Info (D) - {Info}_{A} (D)$ (3)

3.2 Naïve bayes classification

Naïve Bayes (NB) based on the Bayes’ theorem is a probabilistic classifier [31] with assumptions of attribute-independence. It predicts the class membership probabilities as described in the following equation: $P (C_{i} / X) = \frac{P (X / C_{i}) P (C_{i})}{P (X)}$ (4)

where P (C_i/X) denotes the posterior probability, P (X/C_i) as likelihood, P (C_i) as class prior probability and P (X) as the predicted prior probability. NB classifiers assume that the effect of an attribute value on a given class is independent of the values of other attributes. Thus, it is based on the assumption of class conditional independence. Further, let the training dataset be D = {X₁, X₂, X₃, …, X_n}, where D denotes a set of training tuples and their class labels. Each tuple {x₁, x₂, x₃, … x_n} illustrates n measurements taken from n attributes {A₁ ; A₂ ; A₃ ; … , A_n} respectively where each tuple belongs to one of the m classes defined as {C₁, C₂, …, C_m}. For a given test sample X , the classifier predicts that X belongs to the class having highest posterior probability, conditioned on X . Therefore, the tuple X belongs to class C_i if and only if $P (C_{i} / X) > P (C_{j} / X)$ (5)

In Equation (5), 1 ≤ j ≤ m, j ≠ i. Further from Equation (4), since P (X) is constant for all the given classes, the term P (X/C_i) P (C_i) needs to be maximized. In case, the prior probabilities of the classes are unknown, it is presumed that classes are having equally likely distribution such that P (C₁) = P (C₂) = … = P (C_m)and therefore maximize X/C_i or else maximize P (X/C_i) P (C_i). The prior probabilities of the classes are computed using P (C_i) = |C_i,D|/|D|, where |C_i,D| represents the number of training instances of class C_i in dataset D. Among high dimensional datasets, it becomes extremely expensive to compute P (X/C_i). Therefore, given the class label of a tuple, NB class independence assumes that the attribute values are conditionally independent of one another. Consequently, this reduces the computation in evaluating P (X/C_i) which is defined as $¶ (X / C_{i}) = \prod_{k = 1}^{n} (x_{k} / C_{i})$ (6) $P (X / C_{i}) = P (x_{1} / C_{i}) \times P (x_{2} / C_{i}) \times \dots \times P (x_{n} / C_{i})$ (7)

The probabilities P (x₁/C_i) , P (x₂/C_i) … P (x_n/C_i) can be estimated from the training tuples where x_k denotes the value of attribute A_k for tuple X. For each of the given attribute, we determine whether the attribute is categorical or continuous-valued. In computation of P (X/C_i), if A_k is categorical, then P (x_k/C_i) is the probability of determing x_k for A_k given the class C_i in the dataset D. If A_k is continuous-valued, then it is assumed to have Gaussian distribution with mean μ and standard deviation σ given by $P (x_{k} / C_{i}) = g (x_{k}, μ_{c_{i}}, σ_{c_{i}})$ (8)

3.3 Random Forest classification

Random Forest proposed by Breiman [32] is an ensemble classification approach which integrates multiple tree predictors in such a way that each tree depends on the values of an independently sampled random vector with similar distribution among all the trees in the forest. The RF algorithm produces a random vector θk, independent from the past random vectors θ₁, …, θ_{K
-1} and distributed among all the trees, where each tree is grown using the training data and θ_k. This produces an agglomeration of tree-structured classifiers {h_k (x) = h (x, θ_k) |k = 1, …, K }, where x represents the input vector where θ₁, …, θ_K are independently and identically distributed. At each input x, every tree casts a single vote for the most popular class. Unlike bagging where all the attributes are utilized, RF uses only a random subset of features for node splitting. It produces efficient results in both the classification and regression problems. Moreover, our proposed Bagged NB-DT algorithm proves to be efficient than RF on multi-class datasets with bag size 100.

4 Proposed Bagged Naïve Bayes-Decision Tree approach

This section discusses the proposed Bagged Naïve Bayes-Decision Tree technique based on the application of ensemble approach to the integration of C4.5 decision tree with NB classifier. For a given training dataset D = {x₁, x₂, …, x_n}, each sample is depicted as x_i = {x_i1, x_i2, …, x_ih}, and the variables are shown as {A₁, A₂, …, A_n}. The set of class labels in the training data are defined as C = {C₁, C₂, …, C_m}. Firstly, the NB classifier is applied to dataset D for prediction of each x_i. Then, for each class C_i ∈ D, the prior probability P (C_i) and for each variable A_ij, the class conditional probability P (A_ij|C_i), are calculated for dataset D. Further, each training sample is classified by utilizing these probabilities and finally the sample belongs to class C_i which has the highest posterior probability P (C_i|x_i). Thus, all the wrongly classified training instances are removed from the dataset D. These misclassified samples are the troublesome instances which adversely affect the classification accuracy and are therefore eliminated by the NB classifier. Such noisy samples contain improper features and contradictory class labels as compared to the original data. Due to the presence of noisy samples, the DT learner might overfit or its accuracy may decrease.

Fig.1

Workflow of the proposed Bagged Naïve-Bayes Decision Tree algorithm.

After removal of noisy instances from D, the decision tree learner is built from the newly constructed noise free dataset D′. For DT construction, the best splitting attribute is obtained which produces highest information gain. Further, the accuracy of the proposed model can be increased by optimizing the DT parameters such as tree size, split parameter etc. The flow chart of the proposed model is shown in Fig. 1.

As an ensemble-based technique, bagging integrates the predictions of multiple classifiers to produce an individual classifier. The resulting single classifier is more accurate than any of the individual classifiers forming the ensemble. Previous researches have demonstrated that a better ensemble can only be formed when the single learners forming the ensemble are accurate and generate their errors on distinct regions of the input space. One of the key factors which improves the performance as well aspredictive accuracy of the bagging technique is stability whereas instability refers to small modifications in the dataset which leads to crucial changes in the prediction results. Bagging creates accurate ensembles on the basis of resampling techniques, also known as bootstrapping methods by obtaining varied training datasets for each individual learner. This technique is more efficient on learning algorithms such as neural networks and decision trees which are unstable in nature [33]. Bagging improves the classification accuracy of unstable machine learning algorithms but not of the stable machine learning algorithms. It helps reduce variance and avoids overfitting of the classifier.

In this paper, two approaches NB and DT are selected as weak leaners for ensemble learning. Firstly, both NB and DT are hybridized into one single entity over which the bagging approach is applied with creation of 100 such bags. Finally, the 100 prediction outputs are combined through majority voting technique as a classifier consensus system. The class label obtaining the highest number of votes is regarded as the final predicted class for the test sample. The basic steps of proposed approach are as follows:

Suppose T = {(x₁, y₁) , (x₂, y₂) , …, (x_m, y_n)} depicts a training dataset where m is the total number of samples and their corresponding class labels are represented by n, such that an individual instance with its associated class is shown as (x_i, y_i).

A uniform sampling with replacement technique is adopted for obtaining a new training dataset D with some repeated observations, with D having approximately 63.2% of unique samples of T and the remaining left samples are duplicates.

Due to this, the overall bootstrap sampled dataset consists of some repeatable instances whereas some samples are eliminated.

Finally, the hybrid NB-DT algorithm is applied to the different bootstrap data bags for obtaining their respective predictions, to which majority voting is applied for attaining an integrated output. The Bagged Naïve Bayes Decision Tree approach is explained as below inalgorithm 1:

5 Experiments

The detailed experiments were performed on 52 benchmark datasets. The missing values were removed from the dataset as a part of preprocessing mechanism. The flowchart of the proposed work has been shown as Fig. 1. Furthermore, the benchmark data and parameters, comparison with the baseline methods, tabular analysis with the accuracy as performance parameter is shown.

Table 1
Details of the experimental analysis results on classification accuracy and standard deviation (SD) % on b = 100

Dataset Bagged NB-DT Bagged NB Bagged DT RF NB DT

abalone 26.28±1.29 25.11±2.31 23.65±2.24^b 24.44±2.22^b 24.99±1.56 20.75±2.14^b

appendicitis 88.64±13.38 86.73±10.33 87.73±6.24 84.91±10.14 85±10.09 82.91±7.69

australian 87.64±3.22 69.28±5.58^b 86.96±4.48 86.38±4.28 67.83±2.88^b 83.19±5.08^b

automobile 79.92±9.15 68.54±11.8^b 85.5±11.49 86.12±8.32 64.75±10.39^b 82.92±8.26

balance 84.96±4.23 91.36±1.36^a 84.32±1.79 80.80±4.28^b 91.2±1.73^a 78.56±3.13^b

bands 69.05±5.3 67.68±3 76.55±7.71^a 76.13±9.30 67.12±3.38 61.64±7.02^b

blood-transfusion 77.93±4.11 75.26±2.78 76.34±4.03 72.72±4.70^b 75.27±3.83 74.6±4.2

breast-tissue 66.73±12.46 70.82±11.4 74.36±9.32 72.00±12.77 64.91±13.93 64.82±10.04

bupa 74.15±4.65 65.54±5.69^b 72.12±8.31 73.11±7.68 63.51±7.87^b 64.65±5.65^b

cardiotocography 74.08±2.68 46.85±3.21^b 87.4±1.38^a 87.35±2.61^a 44.12±3.87^b 82.69±2.97^a

cleveland 59.66±7.25 55.56±3.67 56.57±5.21 56.23±4.16 55.89±5.35 48.52±8.09^b

coil-2000 94.13±0.05 94.02±0.07^b 93.43±0.35^b 92.93±0.38^b 94.02±0.07^b 91.39±0.46^b

contraceptive 55.7±2.81 52.88±3.96 54.38±2.69 51.86±2.68^b 53.63±3.8 50.91±4.66^b

crx 83.46±4.05 68.14±4.99^b 87.3±3.65^a 87.44±4.22^a 67.23±6.43^b 82.39±4.85

diabetic-retino-derecen 62.21±4.38 55.69±4.72^b 67.86±4.5^a 68.90±5.29^a 56.04±3.56^b 61.95±5.35

dermatology 97.65±3.43 93.29±3.03^b 95.82±2.69 97.21±2.62 91.63±3.19^b 93.86±4.9

ecoli 81.27±4.98 83.62±6.82 81.2±7.64 85.38±6.40 81.26±4.17 77.74±7.22

flare 74.86±2.56 49.05±5.67^b 73.73±2.07 73.17±3.92 51.6±3.94^b 73.08±1.84

german 71.5±2.07 70.6±1.71 76.8±2.1^a 76.90±3.84^a 70.6±2.27 67.1±3.7^b

haberman 75.2±1.25 73.53±4.47 70.91±7.08 69.01±7.84^b 42.68±17.49^b 66.73±8.6^b

heart 78.52±6 79.26±7.03 81.85±7.9 81.48±6.76 79.63±6.36 73.33±7.37

hepatitis 82.5±10.54 86.25±10.94 83.75±8.44 87.50±8.33 86.25±10.94 81.25±13.5

house-votes-84 97.25±5.93 89.66±7.71^b 96.56±4.89 96.11±3.21 90.11±5.35^b 95.67±5.02

indian-liver-patient 67.88±6.27 68.07±6.02 70.27±5.56 71.68±4.50 66.5±5.19 66.68±4.68

ionosphere 93.71±4.27 92.03±5.16 92.01±5.19 93.47±4.60 91.46±3.27 88.88±6.94

iris 96.67±6.13 95.33±5.49 96±3.44 94.67±6.89 96±4.66 94.67±6.89

leaf 75.59±8.88 70.59±6.65 76.18±6.71 78.53±7.08 73.24±7.26 62.35±7.31^b

led7digit 69.2±5.18 72.4±6.59 71±9.01 70.60±6.40 74±5.16 71±8.45

lymphography 86.9±11.82 83.1±8.67 77.81±13.28 82.48±7.73 86.43±6.47 75.76±14.11

magic 80.3±0.88 76.28±1.32^b 87.84±0.68^a 88.00±0.70^a 76.25±1.13^b 82.04±1.08^a

mammographic 76.14±4.24 72.89±6.31 80.84±4.52^a 79.64±4.56 71.33±5.11^b 79.16±3.81

marketing 34.20±1.82 31.27±1.01^b 33.67±1.19 32.02±1.93^b 31.24±1.24^b 29.81±1.71^b

movement-libras 75.00±5.71 71.39±6.55 78.89±5.89 81.94±6.31^a 66.67±7.17^b 66.11±8.47^b

newthyroid 92.97±5.62 96.71±5.03 93.1±5.44 95.35±4.34 96.77±4.83 91.6±5.34

optdigits 92.53±1.62 87.85±1.00^b 96.01±0.81^a 98.29±0.45^a 85.84±1.6^b 89.75±1.35^b

pima 77.86±2.64 74.88±2.82^b 76.03±5.35 77.21±3.14 73.57±3.98^b 72.91±4.41^b

qualitative-bankruptcy 98.4±2.8 99.6±1.26 97.6±3.86 99.00±1.26 99.10±1.26 98.4±2.07

ring 94.78±0.96 97.81±0.36^a 94.39±0.92 95.19±0.74 97.69±0.55^a 88.86±1.38^b

saheart 70.12±3.54 68.4±7.57 69.51±7.14 67.73±7.14 66.45±8.77 60.85±5.75^b

satimage 85.05±1.44 81.83±1.6^b 90.82±0.55^a 91.83±0.94^a 81.82±1.16^b 86.14±1.28

spectfheart 78.28±8.56 76.4±7.38 81.31±7.37 82.02±3.79 73.76±7.14 71.92±7.61

thoracic-surgery 86.21±0.01 85.11±0.01^b 82.98±2.65^b 83.62±1.75^b 84.89±0.67^b 78.94±3.24^b

thyroid 94.68±0.36 93.97±0.56^b 99.65±0.16^a 99.68±0.16^a 93.99±0.33^b 99.83±0.24^a

titanic 79.83±1.77 77.83±2.33^b 79.05±1.64 79.06±1.93 77.83±2.77 79.05±2.23

twonorm 98.66±0.9 97.74±0.54^b 96.41±0.59^b 96.77±0.76^b 97.7±0.67^b 84.04±1.19^b

vehicle 68.2±3.74 61.69±5.45^b 74.11±3.04^a 76.24±4.37^a 62.07±4.68^b 69.98±3.28

vertebral-column-2C 77.74±3.21 76.77±8.3 82.26±4.09^a 84.19±6.53^a 76.77±7.87 79.35±5.73

vertebral-column-3C 83.55±5.15 80±6.23 83.23±6.42 83.55±7.36 80.32±7.67 79.35±8.63

wdbc 97.2±2.19 94.19±3.01^b 95.96±2.99 96.49±1.85 94.56±2.54^b 92.27±2.9^b

wine 98.46±4.74 97.71±4.05 97.12±4.87 98.30±2.74 97.22±3.93 88.86±10.12^b

wisconsin 97.19±1.73 96.78±1.81 96.49±2.29 96.92±1.62 96.93±1.28 95.03±2.31^b

wpbc 79.89±7.82 64.42±12.2^b 79.42±4.68 77.87±4.74 62.82±9.02^b 71.11±9.72^b

Dataset	Bagged NB-DT	Bagged NB	Bagged DT	RF	NB	DT
abalone	26.28±1.29	25.11±2.31	23.65±2.24^b	24.44±2.22^b	24.99±1.56	20.75±2.14^b
appendicitis	88.64±13.38	86.73±10.33	87.73±6.24	84.91±10.14	85±10.09	82.91±7.69
australian	87.64±3.22	69.28±5.58^b	86.96±4.48	86.38±4.28	67.83±2.88^b	83.19±5.08^b
automobile	79.92±9.15	68.54±11.8^b	85.5±11.49	86.12±8.32	64.75±10.39^b	82.92±8.26
balance	84.96±4.23	91.36±1.36^a	84.32±1.79	80.80±4.28^b	91.2±1.73^a	78.56±3.13^b
bands	69.05±5.3	67.68±3	76.55±7.71^a	76.13±9.30	67.12±3.38	61.64±7.02^b
blood-transfusion	77.93±4.11	75.26±2.78	76.34±4.03	72.72±4.70^b	75.27±3.83	74.6±4.2
breast-tissue	66.73±12.46	70.82±11.4	74.36±9.32	72.00±12.77	64.91±13.93	64.82±10.04
bupa	74.15±4.65	65.54±5.69^b	72.12±8.31	73.11±7.68	63.51±7.87^b	64.65±5.65^b
cardiotocography	74.08±2.68	46.85±3.21^b	87.4±1.38^a	87.35±2.61^a	44.12±3.87^b	82.69±2.97^a
cleveland	59.66±7.25	55.56±3.67	56.57±5.21	56.23±4.16	55.89±5.35	48.52±8.09^b
coil-2000	94.13±0.05	94.02±0.07^b	93.43±0.35^b	92.93±0.38^b	94.02±0.07^b	91.39±0.46^b
contraceptive	55.7±2.81	52.88±3.96	54.38±2.69	51.86±2.68^b	53.63±3.8	50.91±4.66^b
crx	83.46±4.05	68.14±4.99^b	87.3±3.65^a	87.44±4.22^a	67.23±6.43^b	82.39±4.85
diabetic-retino-derecen	62.21±4.38	55.69±4.72^b	67.86±4.5^a	68.90±5.29^a	56.04±3.56^b	61.95±5.35
dermatology	97.65±3.43	93.29±3.03^b	95.82±2.69	97.21±2.62	91.63±3.19^b	93.86±4.9
ecoli	81.27±4.98	83.62±6.82	81.2±7.64	85.38±6.40	81.26±4.17	77.74±7.22
flare	74.86±2.56	49.05±5.67^b	73.73±2.07	73.17±3.92	51.6±3.94^b	73.08±1.84
german	71.5±2.07	70.6±1.71	76.8±2.1^a	76.90±3.84^a	70.6±2.27	67.1±3.7^b
haberman	75.2±1.25	73.53±4.47	70.91±7.08	69.01±7.84^b	42.68±17.49^b	66.73±8.6^b
heart	78.52±6	79.26±7.03	81.85±7.9	81.48±6.76	79.63±6.36	73.33±7.37
hepatitis	82.5±10.54	86.25±10.94	83.75±8.44	87.50±8.33	86.25±10.94	81.25±13.5
house-votes-84	97.25±5.93	89.66±7.71^b	96.56±4.89	96.11±3.21	90.11±5.35^b	95.67±5.02
indian-liver-patient	67.88±6.27	68.07±6.02	70.27±5.56	71.68±4.50	66.5±5.19	66.68±4.68
ionosphere	93.71±4.27	92.03±5.16	92.01±5.19	93.47±4.60	91.46±3.27	88.88±6.94
iris	96.67±6.13	95.33±5.49	96±3.44	94.67±6.89	96±4.66	94.67±6.89
leaf	75.59±8.88	70.59±6.65	76.18±6.71	78.53±7.08	73.24±7.26	62.35±7.31^b
led7digit	69.2±5.18	72.4±6.59	71±9.01	70.60±6.40	74±5.16	71±8.45
lymphography	86.9±11.82	83.1±8.67	77.81±13.28	82.48±7.73	86.43±6.47	75.76±14.11
magic	80.3±0.88	76.28±1.32^b	87.84±0.68^a	88.00±0.70^a	76.25±1.13^b	82.04±1.08^a
mammographic	76.14±4.24	72.89±6.31	80.84±4.52^a	79.64±4.56	71.33±5.11^b	79.16±3.81
marketing	34.20±1.82	31.27±1.01^b	33.67±1.19	32.02±1.93^b	31.24±1.24^b	29.81±1.71^b
movement-libras	75.00±5.71	71.39±6.55	78.89±5.89	81.94±6.31^a	66.67±7.17^b	66.11±8.47^b
newthyroid	92.97±5.62	96.71±5.03	93.1±5.44	95.35±4.34	96.77±4.83	91.6±5.34
optdigits	92.53±1.62	87.85±1.00^b	96.01±0.81^a	98.29±0.45^a	85.84±1.6^b	89.75±1.35^b
pima	77.86±2.64	74.88±2.82^b	76.03±5.35	77.21±3.14	73.57±3.98^b	72.91±4.41^b
qualitative-bankruptcy	98.4±2.8	99.6±1.26	97.6±3.86	99.00±1.26	99.10±1.26	98.4±2.07
ring	94.78±0.96	97.81±0.36^a	94.39±0.92	95.19±0.74	97.69±0.55^a	88.86±1.38^b
saheart	70.12±3.54	68.4±7.57	69.51±7.14	67.73±7.14	66.45±8.77	60.85±5.75^b
satimage	85.05±1.44	81.83±1.6^b	90.82±0.55^a	91.83±0.94^a	81.82±1.16^b	86.14±1.28
spectfheart	78.28±8.56	76.4±7.38	81.31±7.37	82.02±3.79	73.76±7.14	71.92±7.61
thoracic-surgery	86.21±0.01	85.11±0.01^b	82.98±2.65^b	83.62±1.75^b	84.89±0.67^b	78.94±3.24^b
thyroid	94.68±0.36	93.97±0.56^b	99.65±0.16^a	99.68±0.16^a	93.99±0.33^b	99.83±0.24^a
titanic	79.83±1.77	77.83±2.33^b	79.05±1.64	79.06±1.93	77.83±2.77	79.05±2.23
twonorm	98.66±0.9	97.74±0.54^b	96.41±0.59^b	96.77±0.76^b	97.7±0.67^b	84.04±1.19^b
vehicle	68.2±3.74	61.69±5.45^b	74.11±3.04^a	76.24±4.37^a	62.07±4.68^b	69.98±3.28
vertebral-column-2C	77.74±3.21	76.77±8.3	82.26±4.09^a	84.19±6.53^a	76.77±7.87	79.35±5.73
vertebral-column-3C	83.55±5.15	80±6.23	83.23±6.42	83.55±7.36	80.32±7.67	79.35±8.63
wdbc	97.2±2.19	94.19±3.01^b	95.96±2.99	96.49±1.85	94.56±2.54^b	92.27±2.9^b
wine	98.46±4.74	97.71±4.05	97.12±4.87	98.30±2.74	97.22±3.93	88.86±10.12^b
wisconsin	97.19±1.73	96.78±1.81	96.49±2.29	96.92±1.62	96.93±1.28	95.03±2.31^b
wpbc	79.89±7.82	64.42±12.2^b	79.42±4.68	77.87±4.74	62.82±9.02^b	71.11±9.72^b

5.1 Experimental setup and benchmark datasets

Table 2
Two-tailed Students’ t-test on classification accuracy (ACC).

Bagged NB Bagged DT RF NB DT

Bagged DT 18/30/4

RF 22/15/15 3/46/3

NB 0/50/2 5/26/21 4/23/25

DT 12/25/15 0/25/27 0/24/28 14/23/15

Bagged NB-DT 22/28/2 4/36/12 9/32/11 24/26/2 23/26/3

	Bagged NB	Bagged DT	RF	NB	DT
Bagged DT	18/30/4
RF	22/15/15	3/46/3
NB	0/50/2	5/26/21	4/23/25
DT	12/25/15	0/25/27	0/24/28	14/23/15
Bagged NB-DT	22/28/2	4/36/12	9/32/11	24/26/2	23/26/3

The whole experiment is conducted in the MATLAB platform, which runs on Windows 8 operating system with Intel® Core™ i7-4770 CPU (3.4GHz) and 8 GB of RAM. The proposed method is implemented on 52 benchmark datasets from the UCI repository [34] and some other benchmark databases. Appendicitis dataset [35], three datasets namely ring, titanic, and twonorm from delve repository [36], and saheart dataset [37] utilized in the experimental analysis are obtained from different sources other than UCI. The different datasets includes data from life science, physical sciences and other areas. As a data preprocessing step, we have removed those instances from the dataset which contain missing values. The results of classification are achieved by experimental analysis of ten runs of ten-fold CV with proposed model obtained from training data and performance parameters evaluated from thetesting data.

In our experiments, the selected algorithms are evaluated using classification accuracy (measured by ACC) as the performance parameter. The accuracy of each algorithm is computed using the percentage of correctly classified instances in the testing dataset. ACC is defined by $ACC = \frac{TP + TN}{TP + TN + FP + FN}$ (9)

In order to perform a comparative analysis, the proposed algorithm has been compared with the below given baseline methods:

Bagged-NB: An ensemble prediction model based on bootstrap-aggregated (bagged) naïve bayes classifiers.

Bagged-DT: A bootstrap-aggregated (bagged) decision trees [38] which are a combination of various CART based decision trees.

NB: A Naïve Bayes classifier possessing the attribute independence assumptions given the class label [31].

DT: A standard Classification & Regression Tree (CART) [39] based on Gini index as the impurity measure.

RF: A standard random forest [40] as an ensemble of tree-based classifiers with random vector sampling.

Table 1 reports the detailed results of the average accuracy (ACC) and the corresponding standard deviation values of Bagged NB-DT and other baseline algorithms. The ACC values of competing algorithm are associated with either ^a or ^b or none. Here ^a represents that the ACC values of the algorithm is significantly higher (upgradation) than the proposed algorithm. Similarly, ^b shows that the ACC value of the competing algorithm is significantly lower (degradation) than that of the proposed algorithm. Neither ^a nor ^b write up along with the ACC value of any competing algorithm means that there is no significant difference between the ACC values of that competing algorithm and the proposed algorithm. For both ^a and ^b the level of significance denoted by p is less than 0.05 i.e. the confidence level is 95 percent. Thus, the symbols ^a and ^b represent statistically significant upgradation and degradation respectively over the proposed Bagged NB-DT with the p-value less than 0.05.

Algorithm 1

BNBDT: Bagged Naïve Bayes Decision Tree approach
Input: D, B where D is the training dataset and their associated class labels, n is the total number of bags or bootstrap samples.
Output: Bagged Naïve Bayes Decision Tree
Method:1: fori = 1 to n, do
// Creation of n models
2: create bootstrap sample B_i, by sampling D. with replacement.
3: for each B_j, do
4: C_i = NBDT (B_i)
// Apply Algorithm 1 to each bag
5: end for
6: end for
7: for each testing instance T, do
8: $C^{*} (x) = arg max_{y} \sum_{i} δ (C_{i} (x) = y)$
//Obtain the predictions for each C_i and perform majority voting
{δ (·) =1 if its argument is true and 0 otherwise}
9: end for

The performance of the proposed Bagged NB-DT, which uses the concept of bagging applied to hybridization of Naïve Bayes and Decision Tree is determined in terms of classification accuracy (ACC). In the Fig. 2, data points below the y = x diagonal line are those datasets on which the proposed Bagged NB-DT algorithm achieves better results than the competing algorithm.

Fig.2

Bagged NB-DT vs. other algorithms: classification accuracy (ACC).

In addition, Table 2 illustrates the compared results of two-tailed t-test, in which each entry w/t/l means that the algorithm in the corresponding row wins in w datasets, ties in t datasets, loses in l datasets on the 52 varied datasets compared to the algorithm in the corresponding column. Altogether, the results of Table 2 can be summarized as follows:

The proposed Bagged NB-DT algorithm shows the best performance when compared with Bagged NB, wins 22 datasets, ties 28 datasets, loses 2 datasets on ACC. Comparison with Bagged DT shows poor performance (4 wins and 12 losses). With RF, the performance is slightly degraded with 9 wins and 11 losses and 32 ties. When compared to standard algorithms namely NB and DT, it depicts better performance (24 wins, 2 losses and 26 ties) with NB and (23 wins, 3 losses and 26 ties) with DT.

DT shows worst performance when compared with Bagged DT and RF with 0 wins on any dataset. It shows bad performance with Bagged NB (12 wins and 15 losses) and almost ties NB (14 wins and 15 losses).

NB is inferior to Bagged NB (0 wins and 2 losses), Bagged DT (5 wins and 21 losses) and RF (4 wins and 25 losses).

RF slightly outperforms Bagged NB. The results show that RF has a slightly better performance (22 wins and 15 losses) and ties Bagged DT (3 wins and 3 losses).

Bagged DT shows better performance (18 wins and 4 losses) compared with Bagged NB. It ties RF (3 wins and 3 losses), shows better significantly outperforms on NB (21 wins and 5 losses), and depicts excellent performance on DT (27 wins and 0 losses

Algorithm 2

Constructing the Naïve Bayes-Decision Tree (NBDT) Model
Input: D , X , A , C
where dataset D contains the training instances, X = ( X ₁, X _n) and
their associated class labels, A is the set of n attributes,
{ A ₁, A ₂, … A _n} and C = { C ₁, C ₂, …, C _m} are the m classes.
Output: Naïve Bayes-Decision Tree model
Method: 1: for each class C_i ∈ D, do // Phase I: Applying NB
classifier
2: Calculate the prior probabilities, P (C_i).
3: end for
4: for each feature value A_ijinD, do
5: Calculate the class conditional probabilities,
P (A_ij/C_i).
6: end for
7: for each training instance x_iinD, do
9: Calculate the posterior probabilities, P (C_i/x_i)
10: ifx_i is misclassified, then
11: Eliminate x_i from D
12: end if
13: end for
14: Create a node N. Phase II: Construction of DT
15: Find the best splitting criterion and label it as the root
node with attribute A_i.
16: for each outcome k of the splitting criterion S_c, do
17: let D_k be the set of instances in D satisfying
outcome k;
18: ifD_k is empty, then
19: Attach a leaf node labeled with the majority
class in D to node N
20: else attach the node returned by
DTree (D_k,A_i,S_c) to node N.
21: end for
22: return N.

Overall, our experiments reveal that the compared algorithms cannot achieve good performance with respect to accuracy (ACC) measure. Among the competing algorithms, Bagged-DT and RF show slightly better performance on various datasets.

6 Conclusion and future work

In this paper, we have proposed bagged naïve bayes decision tree algorithm for solving multi-class classification problems. During the first phase, both the classifiers naïve-bayes and decision tree are hybridized to become a single entity which acts as base classifier. This is achieved through removal of misclassified or noisy instances by NB classifier and then the modified data is used for DT induction and classification. The second phase consists of bagging the hybridized classifier obtained from the previous phase. This hybrid classifier acts as a base learner for the bagging algorithm. Considering the optimal bag size as 100, majority voting combination method was used for obtaining the final prediction. The performance of the proposed model was compared with five other classification algorithms namely, Bagged NB, Bagged DT, RF as ensemble techniques and NB, DT as individual models on 52 benchmark datasets from UCI machine learning database. The experimental results exhibit that the conceptualized bagged NB-DT classifier significantly outperforms Bagged NB, Bagged DT, RF, NB and DT. As future work, other types of learners can be hybridized with bagging, boosting and stacking ensemble approaches or genetic algorithms can be used to solve real-life dynamic multi-class problems.

References

Kavakiotis ,

Tsave ,

Salifoglou ,

Maglaveras ,

Vlahavas and

Chouvarda , Machine learning and data mining methods in diabetes research, Comput Struct Biotechnol J 15 (2017), 104–116.

S.H.

Liao ,

P.H.

Chu and

P.Y.

Hsiao , Data mining techniques and applications - A decade review from 2000 to 2011, Expert Syst Appl 39 (2012), 11303–11311.

Langley and

Sage , Induction of Selective Bayesian Classifiers, In: Uncertainty Proceedings 1994, 1994, pp. 399–406. Elsevier.

Z.-H.

Zhou , Ensemble Learning, In: Encyclopedia of Biometrics, Springer US, Boston, MA, 2009, pp. 270–273.

Rocha and

S.K.

Goldenstein , Multiclass from binary: Expanding One-versus-all, one-versus-one and ECOC-based approaches, IEEE Trans Neural Networks Learn Syst 25 (2014), 289–302.

Chaudhary ,

Kolhe and

Kamal , A hybrid ensemble for classification in multiclass datasets: An application to oilseed disease dataset, Comput Electron Agric 124 (2016), 65–72.

Aly , Survey on multiclass classification methods, Neural Netw 19 (2005), 1–9.

Silva-Palacios ,

Ferri and

M.J.

Ramírez-Quintana , Probabilistic class hierarchies for multiclass classification, J Comput Sci 26 (2018), 254–263.

Chaudhary ,

Kolhe and

Kamal , An improved random forest classifier for multi-class classification, Inf Process Agric 3 (2016), 215–222.

10.

García-Pedrajas and

Ortiz-Boyer , An empirical study of binary classifier fusion methods for multiclass classification, Inf Fusion 12 (2011), 111–130.

11.

Mousavi and

Eftekhari , A new ensemble learning methodology based on hybridization of classifier ensemble selection approaches, Appl Soft Comput J 37 (2015), 652–666.

12.

Agarwal ,

V.N.

Balasubramanian and

C.V.

Jawahar , Improving multiclass classification by deep networks using DAGSVM and Triplet Loss, Pattern Recognit Lett 112 (2018), 184–190.

13.

I.H.

Sarker ,

M.A.

Kabir ,

Colman and

Han , An Improved Naive Bayes Classifier-Based Noise Detection Technique for Classifying User Phone Call Behavior, In: Australasian Conference on Data Mining, Springer, Singapore, 2018, pp. 72–85.

14.

Ren ,

Lian and

Zou , Incremental naïve bayesian learning algorithm based on classification contribution degree, J Comput 9 (2014), 1967–1974.

15.

Frank and

Asuncion , UCI Machine Learning Repository. University of California, School of Information and Computer Science, http://archive.ics.uci.edu/ml.

16.

De Caigny ,

Coussement and

K.W.

De Bock , A new hybrid classification algorithm for customer churn prediction based on logistic regression and decision trees, Eur J Oper Res 269 (2018), 760–772.

17.

Kotsiantis , A hybrid decision tree classifier, J Intell Fuzzy Syst 26 (2014), 327–336.

18.

D.R.

Carvalho and

A.A.

Freitas , A hybrid decision tree/genetic algorithm method for data mining, Inf Sci (Ny) 163 (2004), 13–35.

19.

Panhalkar and

Doye , An outlook in some aspects of hybrid decision tree classification approach: A survey, In: Proceedings of the International Conference on Data Engineering and Communication Technology, Springer, Singapore, 2017, pp. 85–95.

20.

L.-M.

Wang ,

X.-L.

Li ,

C.-H.

Cao and

S.-M.

Yuan , Combining decision tree and Naive Bayes for classification, Knowledge-Based Syst 19 (2006), 511–515.

21.

Polat and

Güneş , A novel hybrid intelligent method based on C4. 5 decision tree classifier and one-against-all approach for multi-class classification problems, Expert Syst Appl 36 (2009), 1587–1592.

22.

S.-J.

Lee ,

Xu ,

Li and

Yang , A novel bagging C4. 5 algorithm based on wrapper feature selection for supporting wise clinical decision making, J Biomed Inform 78 (2017), 144–155.

23.

Singh and

Verma , Multi-classifier model for software fault prediction, Int Arab J Inf Technol 15 (2018), 912–919.

24.

Ala’raj and

M.F.

Abbod , Classifiers consensus system approach for credit scoring, Knowledge-Based Syst 104 (2016), 89–105.

25.

Sun ,

Lang ,

Fujita and

Li , Imbalanced enterprise credit evaluation with DTE-SBD: Decision tree ensemble based on SMOTE and bagging with differentiated sampling rates, Inf Sci (Ny) 425 (2018), 76–91.

26.

Wu ,

Pan ,

Zhu ,

Cai ,

Zhang and

Zhang , Self-adaptive attribute weighting for Naive Bayes classification, Expert Syst Appl 42 (2015), 1487–1502.

27.

Chandra and

Gupta , Robust approach for estimating probabilities in Naïve–Bayes Classifier for gene expression data, Expert Syst Appl 38 (2011), 1293–1298.

28.

Wei ,

Sen ,

J.Y.

Yang ,

Shen, H.

Bin and

D.J.

Yu , A cascade random forests algorithm for predicting protein-protein interaction sites, IEEE Trans Nanobioscience 14 (2015), 746–760.

29.

Lin ,

Wang ,

Xie and

Zhong , Random forests-based extreme learning machine ensemble for multi-regime time series prediction, Expert Syst Appl 83 (2017), 164–176.

30.

Silva-Palacios ,

Ferri and

M.J.

Ramírez-Quintana , Improving performance of multiclass classification by inducing class hierarchies, Procedia Comput Sci 108 (2017), 1692–1701.

31.

Friedman ,

Geiger and

Goldszmidt , Bayesian network classifiers, Mach Learn 29 (1997), 131–163.

32.

Breiman , Random Forests, 2001, pp. 1–33.

33.

D.W.

Opitz and

Maclin , Popular ensemble methods: An empirical study, J Artif Intell Res 11 (1999), 169–198.

34.

C.L.

Blake and

C.J.

Merz , UCI repository of machine learning databases, http://archive.ics.uci.edu/ml/index.php.

35.

C.A.

Kulikowski and

S.M.

Weiss , Computer systems that learn: classification and prediction methods from statistics neural nets, machine learning, and expert systems, ,Morgan Kaufmann Publishers, San Francisco, 1991.

36.

C.E.

Rasmussen ,

R.M.

Neal ,

Hinton ,

van Camp ,

Revow ,

Ghahramani ,

Kustra and

Tibshirani , Delve Datasets, http://www.cs.toronto.edu/~delve/data/datasets.html.

37.

J.H.

Friedman ,

Tibshirani and

Hastie , Datasets for “The Elements of Statistical Learning,” https://statweb.stanford.edu/~tibs/ElemStatLearn/data.html.

38.

Breiman , Bagging predictors, Mach Learn 24 (1996), 123–140.

39.

Breiman , Friedman ,

Jerome ,

Olshen and