Abstract
Of numerous proposals to refine naive Bayes by weakening its attribute independence assumption, averaged one-dependence estimators (AODE) has been shown to be able to achieve significantly higher classification accuracy at a moderate cost in classification efficiency. However, all one-dependence estimators (ODEs) in AODE have the same weights and are treated equally. To address this issue, model weighting, which assigns discriminate weights to ODEs and then linearly combine their probability estimates, has been proved to be an efficient and effective approach. Most information-theoretic weighting metrics, including mutual information, Kullback-Leibler measure and the information gain, place more emphasis on the correlation between root attribute (value) and class variable. We argue that the topology of each ODE can be divided into a set of local directed acyclic graphs (DAGs) based on the independence assumption, and multivariate mutual information is introduced to measure the extent to which the DAGs fit data. Based on this premise, in this study we propose a novel weighted AODE algorithm, called AWODE, that adaptively selects weights to alleviate the independence assumption and make the learned probability distribution fit the instance. The proposed approach is validated on 40 benchmark datasets from UCI machine learning repository. The experimental results reveal that, AWODE achieves bias-variance trade-off and is a competitive alternative to single-model Bayesian learners (such as TAN and KDB) and other weighted AODEs (such as WAODE).
Introduction
Bayesian network classifiers (BNCs) [1, 2, 3, 4, 5, 6] provide a powerful tool for knowledge representation and inference under uncertainty. However, learning the topology of BNC is an NP-hard problem [7], thus heuristic learning and approximate learning are the realistic solutions. To estimate high-dimensional multi-variate probabilities from the training data, Naive Bayes (NB) [8] infers the joint probability by assuming that the attributes
AODE utilizes a restricted class of super-parent one-dependence estimators (SPODEs) and aggregates the predictions of all qualified estimators within this class. Each SPODE assumes that all the other attributes are conditionally independent given
The SPODEs in AODE have discriminative independence assumptions and their network topologies differ greatly, thus diversity is introduced in terms of data distribution. Linear model weighting has been proved effective by calculating the weight associated with each member to linearly combine their estimates of joint probability. AODE combines the SPODEs by applying uniformly weighted average, and the prediction is made by applying Bayes rule to select the maximum posterior probability. Information-theoretic metrics, e.g., mutual information (MI), conditional log likelihood (CLL) and area under the ROC curve (AUC) have been applied to assign weights to SPODEs [20, 21]. These metrics aim to find a balance between the goodness of fit and model simplicity, and thus they provide a combined score for a proposed explanatory model (a SPODE in our context) and for the data given the model.
However, the independence assumption of SPODE may not be suitable for different instances to the same extent, and that may result in biased estimate of joint probability distribution and harm AODE’s generalization performance. Thus SPODE’s weight should be adaptively changed from instance to instance. For example, the primary value of weighting by mutual information is its capacity to reflect the impact of the root attribute on the class variable, whereas the impact will not remain the same when they take different values and that will reduce the generalization performance of the final weighted AODE. Previous approaches used some form of variants of information-theoretic metrics, e.g., the Kullback-Leibler (KL) divergence and information gain (IG), to measure the correlation between the root attribute value and class variable for weighting [22].
Due to the individual independence assumption of these SPODEs in AODE, the joint probability distributions corresponding to their topologies may fit the data to different extents. One key issue for weighting is how to evaluate the reasonableness of the topologies of SPODEs. The contributions of this paper are listed as follows:
The topology of SPODE member in AODE is divided into a set of local topologies based on the independence assumption. Then multi-variate mutual information is redefined and introduced as an information metric to measure the reasonableness of the local topology in terms of weighting. The proposed weighted AODE algorithm, called adaptively weighted one-dependence estimators (AWODE), adaptively selects weights for SPODEs to alleviate the independence assumption, and the weights can help finely tune the ensemble of SPODEs to make the learned joint probability distribution fit each instance. Through extensive experiments on 40 datasets from the UCI (University of California, Irvine) machine learning repository [23], weighting can help improve both the interoperability of these SPODE members and the generalization performance of the final ensemble BNC. We prove that AWODE achieves bias-variance trade-off and is a competitive alternative to single-model Bayesian learners (such as TAN and KDB) and other weighted AODE algorithms (such as WAODE).
The rest of this paper is organized as follows. In Section 2, we introduce related work for alleviating the independence assumptions of NB and AODE. In Section 3, we introduce the basic idea of information metric for weighting AODE. Section 4 presents experimental evaluation of our proposed methods and their comparison with related approaches. Section 5 presents conclusions and directions for future research.
Formally, the topology of BNC learned from training data is a directed acyclic graph
Example of (a) Naive bayes, (b) An SPODE in AODE.
As Fig. 1a shows, by assuming the attributes are independent of each other given the class variable, NB uses the following formula to estimate
If only relevant, especially non-redundant, attributes are selected and used for prediction, the robustness and classification performance of NB will improve. Univariate filter approach helps mitigate the negative effects of the curse-of-dimensionality which results in exponential increase in the required training data as the number of attributes are increased [24]. Researchers proposed to search through the entire space of all attribute subsets to exclude attributes that introduce dependencies. Langley and Sage proposed Forward Sequential Selection (FSS) [25] to iteratively add each candidate attribute to the subset that starts from an empty set. The performance of the resulting NB on the training data will be evaluated by a scoring measure [26], which can be mutual information, odds ratio, weight of evidence and symmetrical uncertainty coefficient. Kittler proposed Backwards Sequential Elimination (BSE) [25] in the reverse search direction of FSS. Multi-variate filter approach promotes the inclusion of variables that are relevant for classification and at the same time, avoids including redundant variables. Hall proposed Correlation-based Feature Selection (CFS) to determine the relevance of the attribute subset [27]. It uses a best-first search (BFS) to traverse through feature subset space. Any kind of heuristic (FSS, BSE, BFS, etc.) can be used to search for this optimal subset.
Attribute weighting is strictly more powerful than attribute selection, as it reduces to attribute selection by assigning the weights of redundant attributes to zero, whereas attribute selection cannot perform the same as attribute weighting if the weights are between zero and one. Hilden and Bjerregaard [28] proposed to alleviate the negative effects caused by violations of NB’s independence assumption by assigning the attributes with the same weight. Zhang and Sheng [29] proposed to measure the attribute weights with the gain ratio, which is generally used for splitting nodes in decision trees [30]. Wu and Cai [31] proposed to employ a greedy search strategy for attribute weighting based on differential evolution algorithms. Then the independence assumption for NB turns to be
For BNC learning, using a single model to make predictions ignores the uncertainty as to which is the correct model. Thus, all possible models in the model space under consideration should be used when making predictions, with each model weighted by its probability of being the correct model. SPODEs stand between NB ignoring the attribute dependencies on one hand and full BNCs taking maximum flexibility for modelling dependencies on the other, and have demonstrated remarkable performance by exploiting one-order attribute dependencies and keeping the effectiveness of probability estimation. As illustrated in Fig. 1b, SPODEs in AODE relax NB’s attribute independence assumption by allowing all attributes to depend on a common attribute, the superparent, in addition to the class variable. For SPODE
AODE aggregates the predictions of all qualified SPODEs. For a training dataset with
where
NB has
NB performs powerful attribute weighting and similarly, AODE performs SPODE weighting. The classification accuracy of different SPODEs in AODE may differ greatly for the same dataset. Hence, it is problematic to treat all the SPODEs equally. Jiang et al. [20] proposed to respectively use mutual information, conditional log likelihood, classification accuracy and area under the ROC curve as the information metrics for measuring the weights of different SPODEs, and then linearly combine their probability estimates of
Information theory [36] was first introduced and developed by Shannon to explain the principle behind point-to-point communication and data storing. Mutual information
.
The mutual information
.
The conditional mutual information
.
The multivariate mutual information measures the influence of variable
The topology of SPODE (a) and its local topologies that respectively take 
Each attribute in SPODE takes
Similar to the basic idea of target learning [39], AWODE also takes each unlabeled instance
.
For data point
.
For data point
.
For data point
The multivariate mutual information may be positive, negative or zero. The positivity corresponds to relations generalizing the pairwise correlations between
For unlabeled instance
Given pseudo training data
Consider all the
From Eqs (10) and (11) we have
and
Because for different SPODEs, the value of
From the topology of SPODE
[h] The AWODE algorithm.Training dataset
Generate a three-dimensional table of co-occurrence counts for each pair of attribute values and each class label.
According to the attribute values in test instance x, for each class label
take
compute
Compute
Table 1 summarizes the time complexity of each BNC discussed. At training time, AODE needs to generate a three-dimensional table of co-occurrence counts for each pair of attribute values and each class label, and the time complexity is
Complexity summary for different BNCs, where
Data sets
The experiments are conducted on 40 benchmark datasets from the UCI machine learning repository [23], and for these datasets it is supposed that there is no noisy data. Table 2 describes the detailed characteristics of these datasets in ascending order of their sizes, including the number of instances, attributes and classes. As listed in Table 2, the dataset size ranges from 57 instances of Labor to 1,025,010 instances of Poker-hand, and the number of class labels also spans from 2 to 50, enabling us to examine classifiers on datasets with various characteristics. Each algorithm is tested on each dataset using 10 rounds of 10-fold cross validation.
To incorporate the missing values of any attributes into probability computation, the missing values for qualitative attributes and those for quantitative attributes are respectively replaced with a distinct value and mean value in all cases. The BNCs discussed in this paper cannot handle continuous attributes directly, numeric attributes, if any, are discretized using Minimum Description Length Discretization (MDL)[42]. Since
In this section, we compare the performance of our proposed methods AWODE with state-of-the-art BNCs, including semi-naive Bayes approaches and weighted AODE approaches. Jiang et al. proposed in [20] to use mutual information, conditional log likelihood, classification accuracy and area under the ROC curve as the weighting metrics, among which mutual information is proved to be the most effective one and thus introduced in the following study only. The BNCs for comparison study are listed as follows.
NB [8], Naive Bayes. TAN [17], Tree-augmented Naive Bayes. KDB [18], AODE [10], Averaged one-dependence estimators. WAODE AVWAODE-KL [22], AODE which uses the Kullback-Leibler (KL) measure for weighting. TAODE [33], targeted AODE. IWAODE [34], Independence Weighted AODE. IBWAODE [35], Instance-Based Weighting AODE. AWODE.
Tables A1–A4 in the Appendix respectively show the experimental results in terms of zero-one loss, bias, variance and MCC. Win-draw-loss records summarizing the relative zero-one loss, bias and variance are respectively shown in Tables 3–6, and cell
W/D/L comparison results of 0–1 loss on all data sets
W/D/L comparison results of 0–1 loss on all data sets
The comparison results in terms of zero-one loss.
Zero-one loss is the most common loss function to evaluate the classification performance. The bagging mechanism helps AODE exhibit excellent generalization ability by using multiple “weak” SPODEs in terms of bias, and large number of dependency relationships in one single-model BNC can be distributed throughout these committee members. Variance wise, the unrealistic independence assumption of each SPODE in AODE makes it insensitive to the variation in training data and reduces the risk of overfitting. Thus AODE enjoys the advantage over single-model BNCs in terms the trade-off between bias and variance. From Table 3, AODE performs significantly better than all the single-model BNCs, including NB (27 wins and 5 losses), TAN (18 wins and 9 losses) and KDB (21 wins and 10 losses). The independence assumption makes SPODE fail to actively learn the true estimation of joint probability distribution. By assigning discriminative weights to different SPODEs according to information-theoretic metrics, e.g., mutual information (MI), information gain (IG) and Kullback-Leibler measure (KL), corresponding weighted AODEs, including WAODE
The significant difference in data quantities makes ever more urgent the need for highly adaptive learners that have high classification accuracy along with high expressivity (i.e., capacity to learn very complex multivariate probability distributions). AWODE is such a learner. The weights assigned to SPODEs will be self-adaptive for different instances, and that will help improve both the inter-operability of these SPODEs and the generalization performance of the final ensemble BNC. To compare the performance of these BNCs while dealing with datasets of different sizes, the datasets are divided into two categories, small datasets (with number of instances
From Fig. 3(a) WAODE
W/D/L comparison results of bias on all data sets
W/D/L comparison results of bias on all data sets
The bias-variance decomposition provides valuable insights into the components of the error of learned classifiers. Bias measures the deviation between the expected output of the learning algorithm and the real result, and low bias generally means that the BNC is better able to adapt to training data. Given instance x, bias measures how closely the classifier can describe the decision boundary and is defined as [44]
where
W/D/L comparison results of variance on all data sets
Given instance x, variance reflects the sensitivity of learned BNC to variations in the training data, and overfitting may lead to higher variance [45]. Variance is defined as [44]
where
W/D/L comparison results of MCC on all data sets
W/D/L comparison results of MCC on all data sets
The comparison results in terms of Matthews Correlation Coefficient.
Standard learning algorithms are designed under the premise of a balanced class distribution. When dealing with skewed class distributions, the classification problem will become more difficult as the number of class labels increases [46, 47]. The Matthews correlation coefficient (MCC) [48] is introduced by biochemist Brian W. Matthews in 1975 to measure the quality of binary (two-class) classifications. It can be described in the form of a contingency matrix by calculating the Pearson product moment correlation coefficient[49] between actual and predicted values. For multi-class datasets, the classification results can be shown in the form of a confusion matrix as follows in terms of the extended MCC [50].
Each entry
The MCC is an effective balanced measure to describe the confusion matrix of true and false positives and negatives by a single number [49, 51]. Table A4 shows the average results of MCC on all datasets. The corresponding W/D/L comparison results are presented in Table 6. To compare the performance of these BNCs while dealing with datasets of different sizes, the datasets are divided into two categories, two-class datasets (with two class labels) and multi-class datasets (with more than two class labels).
From Fig. 4a, WAODE
Time comparisons.
Figure 5 shows the mean training and classification time comparisons of the different out-of-core BNCs relative to AWODE. Each bar represents the sum of all 40 datasets in a 10-fold cross validation experiment. As shown in Fig. 5a, due to the independence assumptions the topologies of NB and AODE are determined before training, thus their training times are the least when compared to the other BNCs. Among single-model BNCs, TAN and KDB respectively learn 1-dependence and 2-dependence relationships from data at training time, and KDB needs extra time to compute mutual information for sorting attributes. With respect to classification time, high-dependence BNCs need a bit more time to compute the conditional probabilities than low-dependence BNCs. Thus, KDB takes a bit more time than TAN and NB. AODE and weighted AODEs need to aggregate the estimates of joint probabilities of all qualified SPODEs, thus as shown in Fig. 5b their computation overheads are similar whereas they are computationally expensive compared to single-model BNCs.
Among weighted AODEs, for WAODE
The ensemble learning mechanism and independence assumption help AODE enjoy advantage in reducing bias and variance. A potential disadvantage of AODE is its unreasonable non-distinctive weights assigned to SPODEs. In current research, weighting for AODE focuses on the dependency relationship between the root attribute (or root attribute value) and the class variable. We argue that, the weight should help evaluate the reasonableness of the independence assumptions implicated in SPODEs. Based on this premise, the multivariate mutual information is introduced to measure the correlation between attributes in each local topology of the SPODE. Our proposed algorithm, called AWODE, learns weights from specific instance rather than training dataset, and thus retains the simplicity and direct theoretical foundation of AODE while alleviating the limitations of its independence assumptions. The experimental results on datasets from UCI machine learning repository reveal that, AWODE has substantially lower bias than AODE at the cost of a small increase in variance. Because the independence assumptions of SPODEs are violated to different extents, applying model selection together with model weighting to choose the best few rather than all the SPODEs will further help the final ensemble approximate the true joint probability distribution. An interesting future work will be the exploration of more effective methods to select SPODEs and learn the weights for them.
Footnotes
Appendix
Experimental results of 0–1 loss
No.
Data set
NB
TAN
KDB
AODE
WAODE
AVWAODE-KL
TAODE
IWAODE
IBWAODE
AWODE
1
Labor
0.0351
0.0526
0.0351
0.0526
0.0526
0.0526
0.0526
0.0526
0.0526
0.0351
2
Labor-negotiations
0.0702
0.1053
0.0702
0.0526
0.0526
0.0351
0.0526
0.0526
0.0526
0.0351
3
Zoo
0.0297
0.0099
0.0495
0.0297
0.0297
0.0297
0.0198
0.0297
0.0198
0.0198
4
Promoters
0.0755
0.1321
0.2547
0.1321
0.1415
0.1321
0.1226
0.1226
0.1038
0.1509
5
Echocardiogram
0.3359
0.3282
0.3435
0.3206
0.3206
0.3435
0.3359
0.3282
0.3282
0.3282
6
Lymphography
0.1486
0.1757
0.2365
0.1689
0.1554
0.1622
0.1554
0.1554
0.1419
0.1622
7
Iris
0.0867
0.0800
0.0867
0.0867
0.0867
0.0867
0.0867
0.0867
0.0867
0.0867
8
Teaching-ae
0.4967
0.5497
0.5364
0.4901
0.4503
0.4570
0.4636
0.4636
0.4570
0.4503
9
Hepatitis
0.1935
0.1677
0.1871
0.1806
0.1806
0.2000
0.1871
0.1871
0.1742
0.1871
10
Wine
0.0169
0.0337
0.0225
0.0225
0.0169
0.0169
0.0281
0.0169
0.0169
0.0281
11
Glass-id
0.2617
0.2196
0.2196
0.2523
0.2570
0.2383
0.2523
0.2523
0.2196
0.2336
12
Soybean-large
0.1238
0.1107
0.0879
0.0782
0.0814
0.0814
0.0782
0.0814
0.0912
0.0782
13
Ionosphere
0.1054
0.0684
0.0741
0.0741
0.0712
0.0655
0.0741
0.0712
0.0712
0.0684
14
Cylinder-bands
0.2148
0.2833
0.2259
0.1889
0.1796
0.1907
0.1870
0.1815
0.1926
0.1926
15
Balance-scale
0.2720
0.2736
0.2784
0.2832
0.2816
0.2832
0.2832
0.2816
0.2832
0.2592
16
Soybean
0.0893
0.0469
0.0556
0.0469
0.0483
0.0498
0.0483
0.0483
0.0542
0.0483
17
Breast-cancer-w
0.0258
0.0415
0.0744
0.0358
0.0358
0.0372
0.0401
0.0372
0.0372
0.0343
18
Vehicle
0.3924
0.2943
0.2943
0.2896
0.2872
0.2849
0.2766
0.2872
0.2896
0.2754
19
Anneal
0.0379
0.0111
0.0089
0.0089
0.0089
0.0089
0.0078
0.0089
0.0178
0.0078
20
Tic-tac-toe
0.3069
0.2286
0.2035
0.2651
0.2724
0.2808
0.2630
0.2547
0.2662
0.2463
21
Vowel
0.4242
0.1303
0.1818
0.1495
0.1949
0.1889
0.1323
0.1566
0.1697
0.1343
22
Splice-c4.5
0.0444
0.0466
0.0941
0.0365
0.0365
0.0362
0.0362
0.0368
0.0378
0.0362
23
Kr-vs-kp
0.1214
0.0776
0.0416
0.0842
0.0576
0.0544
0.0773
0.0748
0.0826
0.0942
24
Dis
0.0159
0.0159
0.0138
0.0130
0.0143
0.0143
0.0125
0.0133
0.0127
0.0122
25
Hypo
0.0138
0.0141
0.0114
0.0095
0.0101
0.0098
0.0119
0.0098
0.0114
0.0098
26
Abalone
0.4762
0.4587
0.4563
0.4472
0.4475
0.4458
0.4465
0.4470
0.4482
0.4448
27
Waveform-5000
0.2006
0.1844
0.2000
0.1462
0.1450
0.1486
0.1466
0.1462
0.1442
0.1472
28
Optdigits
0.0767
0.0407
0.0372
0.0311
0.0302
0.0290
0.0290
0.0302
0.0276
0.0285
29
Satellite
0.1806
0.1214
0.1080
0.1148
0.1148
0.1187
0.1147
0.1158
0.1117
0.1161
30
Mushrooms
0.0196
0.0001
0.0000
0.0001
0.0000
0.0000
0.0002
0.0000
0.0002
0.0002
31
Thyroid
0.1111
0.0720
0.0706
0.0701
0.0655
0.0614
0.0629
0.0654
0.0706
0.0638
32
Pendigits
0.1181
0.0321
0.0294
0.0200
0.0199
0.0202
0.0200
0.0206
0.0185
0.0202
33
Seer_mdl
0.2379
0.2376
0.2555
0.2328
0.2315
0.2371
0.2340
0.2329
0.2325
0.2330
34
Magic
0.2239
0.1675
0.1637
0.1752
0.1762
0.1834
0.1725
0.1773
0.1744
0.1757
35
Letter-recog
0.2525
0.1300
0.0986
0.0883
0.0853
0.0826
0.0838
0.0871
0.0854
0.0833
36
Shuttle
0.0039
0.0015
0.0009
0.0008
0.0009
0.0008
0.0008
0.0008
0.0011
0.0008
37
Waveform
0.0220
0.0202
0.0256
0.0180
0.0181
0.0181
0.0182
0.0184
0.0181
0.0179
38
Localization
0.4955
0.3575
0.2964
0.3596
0.3566
0.3536
0.3544
0.3766
0.3593
0.3553
39
Covtype
0.3158
0.2517
0.1421
0.2387
0.2289
0.2180
0.2246
0.2342
0.2366
0.2220
40
Poker-hand
0.4988
0.3295
0.1961
0.4812
0.1758
0.3103
0.3453
0.0771
0.4690
0.3311
Experimental results of bias
No.
Data set
NB
TAN
KDB
AODE
WAODE
AVWAODE-KL
TAODE
IWAODE
IBWAODE
AWODE
1
Labor
0.0289
0.0211
0.0279
0.0347
0.0200
0.0342
0.0363
0.0195
0.0205
0.0111
2
Labor-negotiations
0.0505
0.0716
0.0553
0.0316
0.0268
0.0405
0.0474
0.0253
0.0268
0.0232
3
Zoo
0.0318
0.0303
0.0403
0.0273
0.0273
0.0282
0.0282
0.0273
0.0282
0.0273
4
Promoters
0.0786
0.1329
0.1569
0.4777
0.5489
0.5346
0.4000
0.5083
0.2840
0.4740
5
Echocardiogram
0.2844
0.2642
0.3065
0.2751
0.2572
0.2788
0.2763
0.2814
0.2840
0.2812
6
Lymphography
0.0902
0.1027
0.1041
0.0933
0.0951
0.0859
0.0931
0.0892
0.0857
0.0831
7
Iris
0.0612
0.0638
0.0596
0.0586
0.0656
0.0720
0.0592
0.0578
0.0664
0.0500
8
Teaching-ae
0.4836
0.4566
0.4606
0.4370
0.3984
0.4016
0.4198
0.4274
0.4616
0.4124
9
Hepatitis
0.1537
0.1712
0.1741
0.1649
0.1655
0.1782
0.1724
0.1724
0.1749
0.1627
10
Wine
0.0331
0.0507
0.0520
0.0346
0.0381
0.0361
0.0376
0.0351
0.0317
0.0364
11
Glass-id
0.2901
0.2756
0.2713
0.2785
0.2780
0.2823
0.2785
0.2797
0.2818
0.2827
12
Soybean-large
0.1070
0.1422
0.1086
0.0648
0.0655
0.0638
0.0651
0.0656
0.0811
0.0657
13
Ionosphere
0.1220
0.0804
0.0855
0.0744
0.0751
0.0756
0.0764
0.0784
0.0881
0.0790
14
Cylinder-bands
0.2000
0.3117
0.1939
0.1589
0.1501
0.1546
0.1610
0.1487
0.1711
0.1620
15
Balance-scale
0.1840
0.1843
0.1902
0.1905
0.1827
0.1827
0.1905
0.1844
0.1905
0.1725
16
Soybean
0.1015
0.0522
0.0491
0.0524
0.0503
0.0509
0.0515
0.0518
0.0693
0.0522
17
Breast-cancer-w
0.0187
0.0384
0.0449
0.0338
0.0327
0.0337
0.0334
0.0303
0.0234
0.0173
18
Vehicle
0.3330
0.2382
0.2494
0.2415
0.2398
0.2409
0.2412
0.2409
0.2435
0.2393
19
Anneal
0.0354
0.0201
0.0073
0.0214
0.0194
0.0196
0.0214
0.0190
0.0181
0.0189
20
Tic-tac-toe
0.2614
0.1746
0.1367
0.2005
0.2104
0.2063
0.2008
0.1973
0.1994
0.1941
21
Vowel
0.3301
0.1942
0.1745
0.1895
0.1811
0.1836
0.1698
0.1860
0.2249
0.1711
22
Splice-c4.5
0.0351
0.0395
0.0961
0.0308
0.0315
0.0312
0.0308
0.0306
0.0331
0.0316
23
Kr-vs-kp
0.1107
0.0702
0.0417
0.0747
0.0518
0.0467
0.0688
0.0633
0.0763
0.0798
24
Dis
0.0165
0.0193
0.0191
0.0170
0.0179
0.0183
0.0178
0.0167
0.0168
0.0176
25
Hypo
0.0092
0.0124
0.0077
0.0071
0.0078
0.0077
0.0079
0.0074
0.0080
0.0081
26
Abalone
0.4180
0.3126
0.3033
0.3201
0.3212
0.3150
0.3183
0.3205
0.3199
0.3139
27
Waveform-5000
0.1762
0.1232
0.1157
0.1235
0.1184
0.1208
0.1213
0.1197
0.1219
0.1222
28
Optdigits
0.0685
0.0275
0.0250
0.0233
0.0224
0.0217
0.0224
0.0221
0.0200
0.0218
29
Satellite
0.1746
0.0950
0.0808
0.0902
0.0902
0.0922
0.0897
0.0921
0.0884
0.0895
30
Mushrooms
0.0237
0.0001
0.0001
0.0004
0.0002
0.0003
0.0004
0.0002
0.0004
0.0004
31
Thyroid
0.0994
0.0587
0.0553
0.0611
0.0561
0.0527
0.0550
0.0563
0.0648
0.0547
32
Pendigits
0.1095
0.0314
0.0207
0.0228
0.0231
0.0225
0.0225
0.0231
0.0200
0.0231
33
Seer_mdl
0.2361
0.2114
0.2100
0.2226
0.2137
0.2143
0.2150
0.2186
0.2214
0.2178
34
Magic
0.2111
0.1252
0.1241
0.1600
0.1541
0.1676
0.1546
0.1600
0.1595
0.1592
35
Letter-recog
0.2207
0.1032
0.0806
0.0876
0.0823
0.0807
0.0814
0.0850
0.0877
0.0805
36
Shuttle
0.0040
0.0008
0.0007
0.0006
0.0006
0.0007
0.0006
0.0006
0.0007
0.0006
37
Waveform
0.0219
0.0152
0.0210
0.0156
0.0158
0.0157
0.0149
0.0166
0.0157
0.0158
38
Localization
0.4523
0.3106
0.2134
0.3129
0.3068
0.3004
0.3010
0.3281
0.3126
0.2967
39
Covtype
0.3067
0.2298
0.1193
0.2207
0.2083
0.1984
0.2034
0.2166
0.2183
0.2022
40
Poker-hand
0.4979
0.2865
0.1326
0.4216
0.1716
0.2630
0.2622
0.0805
0.3969
0.2546
Experimental results of variance
No.
Data set
NB
TAN
KDB
AODE
WAODE
AVWAODE-KL
TAODE
IWAODE
IBWAODE
AWODE
1
Labor
0.0395
0.0632
0.0721
0.0179
0.0221
0.0658
0.0321
0.0384
0.0268
0.0363
2
Labor-negotiations
0.0653
0.1389
0.1289
0.0526
0.0626
0.0911
0.0789
0.0800
0.0626
0.0716
3
Zoo
0.0439
0.0606
0.0658
0.0424
0.0424
0.0445
0.0445
0.0424
0.0445
0.0424
4
Promoters
0.0786
0.1729
0.1889
0.0994
0.0654
0.0911
0.1486
0.0946
0.1389
0.0946
5
Echocardiogram
0.1272
0.1265
0.1400
0.1319
0.1335
0.1328
0.1284
0.1326
0.1277
0.1305
6
Lymphography
0.0343
0.1116
0.1408
0.0476
0.0479
0.0467
0.0498
0.0476
0.0408
0.0455
7
Iris
0.0428
0.0662
0.0404
0.0374
0.0364
0.0420
0.0388
0.0402
0.0436
0.0360
8
Teaching-ae
0.1484
0.1914
0.1494
0.1650
0.1776
0.1744
0.1622
0.1686
0.1564
0.1636
9
Hepatitis
0.0424
0.0582
0.0612
0.0527
0.0541
0.0473
0.0492
0.0492
0.0486
0.0529
10
Wine
0.0093
0.0493
0.0649
0.0231
0.0246
0.0300
0.0251
0.0276
0.0141
0.0195
11
Glass-id
0.0930
0.1075
0.1189
0.1004
0.1051
0.1008
0.1004
0.1020
0.0999
0.1018
12
Soybean-large
0.0783
0.1176
0.0982
0.0842
0.0855
0.0832
0.0839
0.0854
0.0738
0.0824
13
Ionosphere
0.0242
0.0401
0.0581
0.0385
0.0368
0.0407
0.0381
0.0404
0.0238
0.0362
14
Cylinder-bands
0.0656
0.0739
0.0750
0.0961
0.1010
0.0959
0.0923
0.1002
0.0828
0.0947
15
Balance-scale
0.0848
0.0941
0.0872
0.0854
0.0913
0.0913
0.0854
0.0921
0.0854
0.1005
16
Soybean
0.0302
0.0654
0.0439
0.0326
0.0341
0.0341
0.0331
0.0332
0.0290
0.0324
17
Breast-cancer-w
0.0010
0.0337
0.0504
0.0134
0.0128
0.0157
0.0164
0.0156
0.0122
0.0110
18
Vehicle
0.1120
0.1299
0.1283
0.1277
0.1276
0.1275
0.1273
0.1254
0.1245
0.1285
19
Anneal
0.0168
0.0161
0.0152
0.0197
0.0161
0.0138
0.0174
0.0158
0.0103
0.0142
20
Tic-tac-toe
0.0455
0.0824
0.1125
0.0513
0.0604
0.0652
0.0528
0.0556
0.0529
0.0620
21
Vowel
0.2542
0.2445
0.2325
0.2344
0.2310
0.2334
0.2284
0.2337
0.2463
0.2307
22
Splice-c4.5
0.0078
0.0289
0.0800
0.0085
0.0083
0.0087
0.0085
0.0087
0.0089
0.0083
23
Kr-vs-kp
0.0186
0.0152
0.0111
0.0186
0.0119
0.0143
0.0208
0.0165
0.0185
0.0201
24
Dis
0.0069
0.0005
0.0011
0.0071
0.0021
0.0056
0.0040
0.0065
0.0036
0.0033
25
Hypo
0.0051
0.0071
0.0069
0.0049
0.0056
0.0054
0.0055
0.0057
0.0068
0.0056
26
Abalone
0.0682
0.1693
0.1769
0.1544
0.1543
0.1626
0.1561
0.1523
0.1539
0.1623
27
Waveform-5000
0.0259
0.0690
0.0843
0.0410
0.0420
0.0426
0.0426
0.0416
0.0402
0.0412
28
Optdigits
0.0153
0.0185
0.0254
0.0139
0.0137
0.0138
0.0140
0.0138
0.0132
0.0139
29
Satellite
0.0139
0.0367
0.0455
0.0363
0.0364
0.0389
0.0362
0.0379
0.0325
0.0381
30
Mushrooms
0.0043
0.0002
0.0002
0.0001
0.0001
0.0004
0.0002
0.0003
0.0001
0.0001
31
Thyroid
0.0205
0.0257
0.0272
0.0235
0.0239
0.0247
0.0243
0.0237
0.0202
0.0239
32
Pendigits
0.0157
0.0200
0.0236
0.0127
0.0129
0.0127
0.0130
0.0124
0.0107
0.0131
33
Seer_mdl
0.0097
0.0381
0.0613
0.0194
0.0273
0.0318
0.0295
0.0219
0.0200
0.0241
34
Magic
0.0174
0.0490
0.0491
0.0297
0.0289
0.0307
0.0313
0.0298
0.0291
0.0296
35
Letter-recog
0.0471
0.0591
0.0709
0.0448
0.0455
0.0461
0.0457
0.0455
0.0417
0.0455
36
Shuttle
0.0009
0.0004
0.0003
0.0004
0.0004
0.0005
0.0004
0.0004
0.0003
0.0004
37
Waveform
0.0009
0.0053
0.0037
0.0025
0.0023
0.0028
0.0034
0.0019
0.0024
0.0019
38
Localization
0.0460
0.0594
0.1099
0.0580
0.0632
0.0656
0.0657
0.0577
0.0577
0.0691
39
Covtype
0.0094
0.0245
0.0399
0.0200
0.0224
0.0229
0.0240
0.0195
0.0200
0.0222
40
Poker-hand
0.0000
0.0424
0.0633
0.0273
0.0602
0.0346
0.0607
0.0087
0.0346
0.0696
Experimental results of MCC
No.
Data set
NB
TAN
KDB
AODE
WAODE
AVWAODE-KL
TAODE
IWAODE
IBWAODE
AWODE
1
Labor
0.9273
0.8864
0.9230
0.8864
0.8864
0.8864
0.8864
0.8864
0.8864
0.9230
2
Labor-negotiations
0.8459
0.7659
0.8518
0.8864
0.8864
0.9273
0.8864
0.8864
0.8864
0.9230
3
Zoo
0.9611
0.9871
0.9348
0.9611
0.9610
0.9610
0.9742
0.9610
0.9740
0.9740
4
Promoters
0.8515
0.7380
0.5179
0.7380
0.7380
0.7380
0.7581
0.7559
0.7937
0.9813
5
Echocardiogram
0.1569
0.1460
0.0868
0.1820
0.1820
0.0988
0.1276
0.1643
0.1553
0.1553
6
Lymphography
0.7202
0.6633
0.5493
0.6782
0.7041
0.6915
0.7040
0.7041
0.7301
0.6912
7
Iris
0.8701
0.8800
0.8701
0.8701
0.8701
0.8701
0.8701
0.8701
0.8701
0.8901
8
Teaching-ae
0.2552
0.1774
0.1990
0.2651
0.3265
0.3149
0.3052
0.3052
0.3154
0.3256
9
Hepatitis
0.4780
0.4763
0.3961
0.4487
0.4360
0.3684
0.4224
0.3956
0.4870
0.4356
10
Wine
0.9745
0.9488
0.9659
0.9661
0.9745
0.9744
0.9574
0.9744
0.9745
0.9574
11
Glass-id
0.5986
0.6631
0.6633
0.6132
0.6062
0.6347
0.6139
0.6132
0.6652
0.6429
12
Soybean-large
0.8677
0.8797
0.9039
0.9158
0.9120
0.9120
0.9158
0.9120
0.9016
0.9158
13
Ionosphere
0.7689
0.8507
0.8380
0.8378
0.8441
0.8567
0.8378
0.8505
0.8449
0.8505
14
Cylinder-bands
0.5686
0.4180
0.5335
0.6140
0.6321
0.6103
0.6172
0.6289
0.6080
0.6066
15
Balance-scale
0.5009
0.4983
0.4901
0.4799
0.4829
0.4799
0.4799
0.4829
0.4799
0.5261
16
Soybean
0.9045
0.9487
0.9391
0.9496
0.9479
0.9463
0.9480
0.9479
0.9417
0.9480
17
Breast-cancer-w
0.9447
0.9078
0.8340
0.9226
0.9226
0.9193
0.9127
0.9197
0.9183
0.9259
18
Vehicle
0.4860
0.6094
0.6078
0.6161
0.6178
0.6217
0.6328
0.6196
0.6162
0.6349
19
Anneal
0.9109
0.9722
0.9776
0.9777
0.9777
0.9777
0.9804
0.9777
0.9561
0.9804
20
Tic-tac-toe
0.2849
0.4740
0.5367
0.3848
0.3643
0.3505
0.3899
0.4039
0.3821
0.4416
21
Vowel
0.5336
0.8568
0.8001
0.8357
0.7857
0.7924
0.8546
0.8279
0.8135
0.8523
22
Splice-c4.5
0.9279
0.9241
0.8471
0.9408
0.9407
0.9412
0.9413
0.9403
0.9388
0.9413
23
Kr-vs-kp
0.7567
0.8453
0.9167
0.8320
0.8859
0.8918
0.8457
0.8626
0.8350
0.8116
24
Dis
0.4922
0.1842
0.4063
0.5600
0.5109
0.5272
0.5709
0.5600
0.5507
0.5695
25
Hypo
0.9054
0.9007
0.9204
0.9337
0.9298
0.9323
0.9165
0.9316
0.9214
0.9317
26
Abalone
0.3039
0.3112
0.3131
0.3283
0.3278
0.3305
0.3291
0.3282
0.3269
0.3330
27
Waveform-5000
0.7189
0.7234
0.7001
0.7822
0.7828
0.7774
0.7809
0.7812
0.7849
0.7810
28
Optdigits
0.9149
0.9547
0.9587
0.9654
0.9664
0.9678
0.9678
0.9664
0.9694
0.9684
29
Satellite
0.7807
0.8504
0.8665
0.8593
0.8593
0.8551
0.8593
0.8584
0.8627
0.8580
30
Mushrooms
0.9612
0.9998
1.0000
0.9998
1.0000
1.0000
0.9995
1.0000
0.9995
0.9995
31
Thyroid
0.7724
0.8378
0.8368
0.8465
0.8547
0.8618
0.8592
0.8555
0.8468
0.8583
32
Pendigits
0.8699
0.9643
0.9674
0.9778
0.9779
0.9776
0.9778
0.9772
0.9795
0.9776
33
Seer_mdl
0.4756
0.4693
0.4282
0.4819
0.4826
0.4682
0.4772
0.4800
0.4822
0.4806
34
Magic
0.4894
0.6234
0.6330
0.6085
0.6052
0.5915
0.6154
0.6016
0.6108
0.6073
35
Letter-recog
0.7378
0.8648
0.8975
0.9082
0.9114
0.9141
0.9129
0.9093
0.9112
0.9134
36
Shuttle
0.9891
0.9957
0.9975
0.9979
0.9976
0.9978
0.9979
0.9977
0.9969
0.9979
37
Waveform
0.9674
0.9698
0.9620
0.9730
0.9728
0.9729
0.9728
0.9725
0.9728
0.9732
38
Localization
0.3660
0.5474
0.6274
0.5430
0.5469
0.5507
0.5499
0.5156
0.5433
0.5492
39
Covtype
0.5068
0.6008
0.7716
0.6250
0.6403
0.6530
0.6455
0.6284
0.6279
0.6491
40
Poker-hand
0.0000
0.4242
0.6487
0.0813
0.6801
0.4488
0.3853
0.8660
0.1179
0.4128
