Abstract
The independence assumptions help Bayesian network classifier (BNC), e.g., Naive Bayes (NB), reduce structure complexity and perform surprisingly well in many real-world applications. Semi-naive Bayesian techniques seek to improve the classification performance by relaxing the attribute independence assumption. However, the study of dependence rather than independence has received more attention during the past decade and the validity of independence assumptions needs to be further explored. In this paper, a novel learning technique, called Adaptive Independence Thresholding (AIT), is proposed to automatically identify the informational independence and probabilistic independence. AIT can respectively tune the network topologies of BNC learned from training data and testing instance under the framework of target learning. Zero-one loss, bias, variance and conditional log likelihood are introduced to compare the classification performance in the experimental study. The extensive experimental results on a collection of 36 benchmark datasets from the UCI machine learning repository show that AIT is more effective than other learning techniques (such as structure extension, attribute weighting) and helps make the final BNCs achieve remarkable classification improvements.
Keywords
Introduction
Classification is regarded as one of the key issues in data mining and statistical learning, with the aim of predicting the class of an object with unknown label [1]. It has been widely applied in computer science and engineering, and can deal with facial expression analysis [2], positive and unlabeled learning [3], text classification [4] and so on. Classification model needs to establish the mapping relationship between the input, instance x, and the output, class label
Independence assumption is the most effective way to reduce structure complexity for BNC learning. However, most studies focus on conditional dependence rather than conditional independence. Some information-theoretic metrics, such as mutual information (MI) and conditional mutual information (CMI), are applied as benchmark criteria to measure the direct dependence or conditional dependence [16, 17]. It should be noted that BNCs are probabilistic models rather than informational models, conditional independence in probability theory (denoted as CIP for short) does not correspond to conditional independence in information theory (denoted as CII for short). Weak informational dependencies are commonly regarded as probabilistic independencies. For BNC learning, information-theoretic metrics can’t distinguish between probabilistic dependence and probabilistic independence. If probabilistic dependencies are introduced into the topology of BNC as probabilistic independencies by mistake, the estimates of conditional probabilities may be biased and the classification performance will be degraded. Therefore, the reason why the independence assumptions work needs to be further explored, and researchers urgently need a scalable learning technique for learning BNCs with complex dependencies that can capture the right conditional independencies and dependencies.
The main contributions of this paper are as follows:
We propose a novel semi-naive Bayesian learning technique, called Adaptive Independence Thresholding (AIT), to distinguish between dependence and independence in one single pass. Different criteria are applied to identify informational independence and probabilistic independence, and the resulting highly scalable algorithm combines the high expressivity of generative learning with the low bias of discriminative learning. We compare the performance of our algorithm with other BNCs on 36 benchmark datasets, ranging in size from 57 to 164860 instances. We show that AIT helps base BNCs (such as TAN and KDB) achieve competitive classification performance in terms of zero-one loss, bias, variance and conditional log likelihood.
The paper is organized as follows. Section 2 reviews some state-of-the-art BNCs. Section 3 clarified the difference between informational independence and probabilistic independence. Section 4 introduces the basic idea of AIT. Section 5 presents a set of comparisons for our proposed algorithm on 36 UCI datasets with out-of-core BNCs. To finalize, the last section draws conclusions and outlines the directions for further research.
A BNC consists of two parts: the network topology
As shown in Eq. (1), for FBNC attribute
Naive Bayes (NB) explicitly assumes that attributes are conditionally independent given class
The information-theoretic metrics used for weighting are commonly similar to that for identifying significant conditional dependencies. Thus the network topology is further refined by weighting on the basis of structure extension, and weighting can indirectly weaken the independence assumptions and finely tune the estimates of conditional probabilities. For example, Jiang et al. [20] proposed to take each attribute as the root node of the network topology of TAN in turn. The mutual information between the root node and class variable is applied as the weighting metric. The final decision is made by aggregating the predictions of a restricted class of weighted TANs. Zaidi [21] proposed to apply conditional log likelihood and mean square error as the weighting metrics to alleviate NB’s independence assumption. Wu et al. [16] proposed to combine evolutionary computation and self-adaptive weighting. The objective function can automatically calculate the proper attribute weight value to ensure that the attribute weighting can adapt to different classification tasks. Jiang et al. [25] proposed to use mutual information as the weighting metric. A hidden parent node is introduced for each attribute to replace the dependency relationships between the attribute node and all other attributes. Lee et al. [26] argued that the importance of attribute
The BNCs learned from training data cannot naturally represent asymmetric independence assertions. In other words, the conditional independence assertion or conditional dependencies learned from training data may not fit different testing instance. Some researchers propose to use variants of these information-theoretic metrics to identify conditional dependencies among attribute values in testing instance. Wang et al. [28] proposed a novel learning framework, target learning, to mine conditional independencies or dependencies implicated in training data and testing instance. Duan et al. [29] proposed to use the normalized CMI as the weighting metric for each SPODE in AODE, and the metric considers the conditional dependence (or independence) among superparent attributes, categorical variables and non-superparent attributes. Frank et al. [30] proposed to relax the independence assumption of NB by learning local models at prediction time. The algorithm weights the training instances and allocates less weights to the instances which are far away from the test instances. This helps to mitigate the impact of attribute dependencies which may exist in the data as a whole. Chen et al. [31] proposed to assign distinct weights to different SPODEs in AODE for different testing instances. Pensar et al. [32] proposed to relax the independence assumption by assuming that when given
As mentioned above, independence assumption is one of the most effective method to reduce structure complexity although it is unrealistic in practice. For over all possible outcomes of attributes
where
where
Obviously, the criterion for identifying CII relationship is different from that for identifying CIP relationship. The conditional independencies corresponding to weak conditional dependencies may result in biased estimate of conditional probability and negative effect on classification performance. The wide range of data quantity makes ever more urgent the need for highly scalable BNC learners that can identify CIP and CII relationships for refining the network topology.
The log likelihood function can measure the extent to which the learned BNC models the probability distribution of data
where
In the following discussion, we will use different BNCs to clarify the difference between informational independence and probabilistic independence. Take TAN and dataset mfeat-mor (see Table 1 for details) as an example for experimental study. TAN compares CMIs among all attribute pairs, and then selects significant conditional dependencies to build maximum spanning tree. For dataset mfeat-mor,
Distributions of values of 
With the increase of the structure complexity, more weak dependencies will be encoded in the learned BNC [33], and the risk of occurrence of CPI relationship will increase correspondingly. For example, KDB can represent arbitrary
The non-negative characteristic of CMI often misleads researchers to believe that there always exist conditional dependence between attribute pair, weak or strong, and adding augment edges to the topology of BNC may positively rather than negatively impact on the result of classification. However, when attributes take different values, dependency relationships do not necessarily exist. Even for strong conditional dependencies measured by CMI, the CIP relationship happens for some instances. And for weak conditional dependencies, the CIP relationship will happen much more often. To avoid biased estimate of conditional probabilities, we need to identify these CIP relationships and remove them from the network topology. Otherwise, with more CIP relationships processed as CDP relationships, the negative effect will accumulate and the classification performance will be degraded greatly. That can clarify why KDB with
As discussed in Section 3, the independence from the perspective of information theory is not equivalent to the independence from the perspective of probability theory. Thus in this paper, we propose to use adaptive independence thresholding (AIT) to discriminate conditional independence from weak conditional dependence. The two components of AIT, AIT-CII and AIT-CIP, can respectively identify CII and CIP relationships.
By comparing the CMI values with the threshold, the conditional independencies identified by applying AIT can help build robust BNC with simplified network topology and reduce the computational overhead needed to perform inference and classification. For AIT, the thresholds should be determined carefully rather than arbitrarily. High threshold may result unjustified edge removals and too simple network topology. Low threshold may yield redundant edges and thus too complex network topology that overfits the data and leads to high-order probability distributions, which decrease reliability and increase running time. Therefore, for different datasets it is necessary to tune the threshold adaptively to achieve the trade-off between bias and variance.
Threshold identification of conditional independence from the perspective of information theory
If
The threshold
AIT-CII sorts all the
An example of the knee point.
Thus, the acute angle
Obviously, the larger the acute angle is, the more significant the difference between
In the following discussion, we map the CMI values into discrete points in rectangular coordinate system. The
By applying heuristic search approach to compare the acute angle for each CMI value in list
Algorithm 2 shows the detail of AIT-CII algorithm, including the learning procedure of how to determine the search interval and identify the threshold. Then the CII relationships will be identified and removed from the network topology of BNC
[h] AIT-CIITraining set
Complex network topology may make the learned BNC overfit training data whereas underfit testing data, and that may result in high variance and inappropriate identification of conditional dependencies. To address this issue, Wang et al. [12] proposed the target learning (TL) framework to respectively learn BNC
To learn the network topology from the testing instance and remove redundant edges, AIT-CIP focuses on the conditional dependence between attribute values. The threshold
The threshold
When the class variable
Algorithm 4.2 shows the detail of AIT-CIP algorithm, including how to determine the search interval and identify the threshold.
[h] AIT-CIPtesting instance
Bayesian model averaging is theoretically the optimal method for combining learned models. After applying AIT-CII and AIT-CIP to respectively refine the network topologies of BNC
Experiments and results
To evaluate the efficiency and effectiveness of our proposed algorithm, AIT, we evaluates its performance on 36 datasets from the UCI machine learning repository [35]. The detailed characteristics of datasets are shown in Table 1. We sorted the 36 datasets in ascending order according to the number of instances. As listed in Table 1, the number of instances ranges from a minimum of 57 to a maximum of 164860, that allows us to compare classifiers on datasets of various sizes. These datasets are divided into two groups with number of instances
Datasets
Datasets
For comparison purpose, we apply AIT to refine the network topologies of state-of-the-art TAN and KDB. The final BNCs (i.e., TAN
Tables (Appendix) A1, (Appendix) A2, (Appendix) A4 and (Appendix) A3 in the Appendix respectively show the experimental results in terms of zero-one loss (ZOL), bias, variance and conditional log likelihood (CLL). We employ the Win/Draw/Loss (WDL) record to interpret the results. We set the significance level to be 0.05, i.e., if the output of a one-tailed binomial sign test is less than 0.05 then we assume that there exists significant difference between the experimental results. Tables 2–4 respectively show corresponding WDL records in terms of ZOL, bias and variance. Each cell
Zero-one loss is a standard loss function in classification [38], which can intuitively evaluate the extents to which the algorithm performs well or poor. As shown in Table 2, after AIT is introduced to discriminate between conditional independencies and conditional dependencies, TAN
WDL records for all BNCs in terms of zero-one loss
WDL records for all BNCs in terms of zero-one loss
In addition, after AIT is applied to refine TAN and KDB, the average of zero-one loss for TAN
During the past decade researchers began to study the possibility of scaling-up of existing learning algorithms as the data quantity increases. As argued by Brain and Webb [39] that accurate learners for large data will achieve lower bias than accurate learners for small data. To further intuitively illustrate the effectiveness of AIT while dealing with large data, Goal Difference (GD) [40] is introduced and described as follows,
where
The fitting curves of GD between TAN and TAN
Figure 3 shows the fitting curve of GD in terms of zero-one loss. The X-axis represents the index number of the dataset. As can be seen from Fig. 3, TAN
As shown in Fig. 4a, TAN
The bias-variance trade-off is one of the key issues for supervised learning [41]. High-bias learner may underfit the training data and fail to capture important regularities. High-variance learner may model the random noise and overfit the unrepresentative training data rather than the intended outputs.
The WDL records for all BNCs in terms of bias
The WDL records for all BNCs in terms of bias
The comparison results in terms of ZOL.
As shown in Table 3, when applied to TAN and KDB AIT helps decreases bias more often than not. Thus the advantage in ZOL is greatly attributed to the advantage in bias. However, the difference in bias between KDB
As shown in Fig. 5a and b, the advantage of TAN
The WDL records for all BNCs in terms of variance
The comparison results in terms of bias.
The comparison results in terms of variance.
Variance-wise, as shown in Table 4 TAN
The comparison results for TAN and KDB with or without AIT in terms of rCLL.
Conditional log likelihood (CLL) function [42], which is defined as follows, is introduced to measure the goodness of fit of a statistical model
where
Figure 7 presents the scatter plot of rCLL. As shown in Fig. 7, the number of blue points representing
The comparison results of training and classification time are shown in Fig. 8, where each bar represents the average of time on all the 36 datasets. The structure complexity of the base classifier and the size of the test data are the main factors that affect the training and classification time.
As shown in Fig. 8a, TAN
The comparison results of training and classification time.
Because of the inconsistency between conditional independence in information theory and that in probability theory, it is not appropriate to describe the independence relationship between attributes by using information-theoretic criterion only. In this paper, an adaptive independence threshold (AIT) scheme is proposed to automatically identify the conditional independence between attributes or attribute values, and dynamically remove the redundant edges to build robust network topology. We explore reasons for the effectiveness of AIT. Extensive experimental results on 36 datasets show that AIT significantly improves the generalization performance of base BNCs (including TAN and KDB). From the experimental results presented in this paper, weighting and independence identification are two effective approaches to improving the estimates of conditional probabilities and they should be mutually compatible. It remains a direction for future research to explore techniques for combing weighting and AIT.
Footnotes
Appendix
Experimental results of zero-one loss
Datasets
TAN
KDB
CFWNB
FKDB
SKDB
TAODE
IWAODE
TAN
KDB
Labor-negotiations
0.1053
0.0702
0.1053
0.0351
0.0877
0.0526
0.0526
0.0533
0.0511
Post-operative
0.3667
0.3778
0.3000
0.3778
0.3556
0.3333
0.3556
0.3333
0.3220
Zoo
0.0099
0.0495
0.0396
0.0198
0.0297
0.0198
0.0198
0.0099
0.0099
Wine
0.0337
0.0225
0.0056
0.0562
0.0337
0.0281
0.0169
0.0393
0.0181
Sonar
0.2212
0.2452
0.1587
0.2452
0.2212
0.2260
0.2260
0.2260
0.2356
Glass-id
0.2196
0.2196
0.1729
0.2103
0.2196
0.2523
0.2196
0.2150
0.2056
Hungarian
0.1701
0.1803
0.1599
0.1769
0.1837
0.1599
0.1599
0.1612
0.1497
Heart-disease-c
0.2079
0.2244
0.1683
0.2145
0.2079
0.2013
0.1980
0.1881
0.1863
Primary-tumor
0.5428
0.5723
0.5634
0.5546
0.5693
0.5782
0.5457
0.5575
0.5520
Ionosphere
0.0684
0.0741
0.0855
0.0741
0.1026
0.0741
0.0712
0.0741
0.0684
House-votes-84
0.0552
0.0506
0.0782
0.0437
0.0529
0.0529
0.0483
0.0552
0.0529
Soybean
0.0469
0.0556
0.0615
0.0630
0.0600
0.0483
0.0542
0.0425
0.0498
Credit-a
0.1507
0.1464
0.1333
0.1536
0.1551
0.1507
0.1391
0.1391
0.1435
Crx
0.1478
0.1565
0.1304
0.1391
0.1507
0.1391
0.1319
0.1386
0.1362
Tic-tac-toe
0.2286
0.2035
0.3100
0.0689
0.1514
0.2630
0.2662
0.2413
0.2093
German
0.2730
0.2890
0.2370
0.2780
0.2540
0.2550
0.2560
0.2590
0.2500
Yeast
0.4171
0.4387
0.4319
0.4319
0.4373
0.4218
0.4232
0.4225
0.4225
Mfeat-mor
0.2970
0.3060
0.3060
0.3080
0.2990
0.3105
0.3120
0.3060
0.3000
Splice-c4.5
0.0444
0.0941
0.0375
0.0416
0.0349
0.0365
0.0101
0.0348
0.0344
Kr-vs-kp
0.0776
0.0416
0.0645
0.0472
0.0329
0.0773
0.0826
0.0569
0.0507
Dis
0.0159
0.0138
0.0156
0.0141
0.0127
0.0125
0.0127
0.0130
0.0128
Abalone
0.4587
0.4563
0.4755
0.4563
0.4654
0.4465
0.4482
0.4554
0.4506
Spambase
0.0669
0.0635
0.0859
0.0728
0.0643
0.0602
0.0646
0.0624
0.0615
Phoneme
0.2733
0.1984
0.2407
0.2444
0.1912
0.2427
0.2104
0.2059
0.1889
Page-blocks
0.0415
0.0391
0.0417
0.0396
0.0340
0.0327
0.0325
0.0342
0.0347
Optdigits
0.0407
0.0372
0.0676
0.0356
0.0374
0.0290
0.0276
0.0378
0.0340
Mushrooms
0.0001
0.0000
0.0080
0.0000
0.0000
0.0002
0.0002
0.0002
0.0000
Pendigits
0.0321
0.0294
0.1130
0.0272
0.0294
0.0200
0.0185
0.0276
0.0285
Sign
0.2755
0.2539
0.3701
0.2463
0.2125
0.2743
0.2789
0.2712
0.2826
Seermdl
0.2376
0.2555
0.2330
0.2600
0.2361
0.2340
0.2325
0.2331
0.2330
Magic
0.1675
0.1637
0.2034
0.1589
0.1626
0.1725
0.1744
0.1790
0.1746
Letter-recog
0.1300
0.0986
0.2479
0.0974
0.1013
0.0838
0.0854
0.0972
0.0893
Adult
0.1380
0.1383
0.1499
0.1363
0.1358
0.1558
0.1502
0.1326
0.1325
Shuttle
0.0015
0.0009
0.0021
0.0007
0.0008
0.0008
0.0011
0.0010
0.0010
Waveform
0.0202
0.0256
0.0199
0.0202
0.0241
0.0182
0.0181
0.0187
0.0190
Localization
0.3575
0.2964
0.4936
0.2963
0.3013
0.3544
0.3593
0.3209
0.3054
Experimental results of bias
Datasets
TAN
KDB
CFWNB
FKDB
SKDB
TAODE
IWAODE
TAN
KDB
Labor-negotiations
0.0716
0.0553
0.0349
0.0795
0.0442
0.0474
0.0268
0.0411
0.0326
Post-operative
0.2687
0.2737
0.2928
0.2703
0.2077
0.2403
0.2190
0.2207
0.2467
Zoo
0.0303
0.0403
0.0840
0.0373
0.0400
0.0282
0.0282
0.0342
0.0367
Wine
0.0507
0.0520
0.0137
0.0483
0.0388
0.0376
0.0317
0.0339
0.0175
Sonar
0.1646
0.1686
0.1230
0.1633
0.1700
0.1707
0.1694
0.1625
0.1806
Glass-id
0.2756
0.2713
0.1197
0.2865
0.2785
0.2785
0.2818
0.2770
0.2752
Hungarian
0.1424
0.1480
0.1484
0.1466
0.1583
0.1581
0.1597
0.1363
0.1467
Heart-disease-c
0.1263
0.1299
0.1440
0.1029
0.1194
0.1134
0.1160
0.1088
0.1185
Primary-tumor
0.4249
0.4184
0.3417
0.4138
0.4068
0.4324
0.4188
0.4388
0.4397
Ionosphere
0.0804
0.0855
0.0813
0.0710
0.0897
0.0764
0.0881
0.0861
0.0804
House-votes-84
0.0410
0.0258
0.0575
0.0302
0.0272
0.0429
0.0493
0.0437
0.0361
Soybean
0.0522
0.0491
0.0695
0.0472
0.0508
0.0515
0.0693
0.0464
0.0482
Credit-a
0.1171
0.1137
0.1301
0.0955
0.1024
0.0940
0.0893
0.1144
0.1118
Crx
0.1180
0.1197
0.1332
0.1030
0.1046
0.0985
0.0904
0.1096
0.1129
Tic-tac-toe
0.1746
0.1367
0.2257
0.0351
0.1207
0.2008
0.1994
0.1752
0.1665
German
0.2057
0.2108
0.2075
0.2058
0.2001
0.2052
0.2112
0.2015
0.2091
Yeast
0.3481
0.3462
0.3644
0.3449
0.3469
0.3457
0.3458
0.3466
0.3444
Mfeat-mor
0.2077
0.2142
0.2455
0.2134
0.2071
0.2431
0.2492
0.2130
0.2177
splice-c4.5
0.0395
0.0961
0.0345
0.0395
0.0289
0.0315
0.4576
0.0320
0.0609
Kr-vs-kp
0.0702
0.0417
0.0583
0.0416
0.0284
0.0688
0.0763
0.0525
0.0372
Dis
0.0193
0.0191
0.0127
0.0201
0.0182
0.0178
0.0168
0.0191
0.0191
Abalone
0.3126
0.3033
0.3728
0.3033
0.3102
0.3183
0.3199
0.3285
0.3146
Spambase
0.0570
0.0497
0.0750
0.0580
0.0483
0.0541
0.0602
0.0562
0.0522
Phoneme
0.2394
0.1572
0.2003
0.1641
0.1514
0.2186
0.1829
0.2026
0.1409
Page-blocks
0.0308
0.0280
0.0331
0.0259
0.0263
0.0248
0.0257
0.0278
0.0282
Optdigits
0.0275
0.0250
0.0594
0.0230
0.0252
0.0224
0.0200
0.0289
0.0228
Mushrooms
0.0001
0.0001
0.0103
0.0000
0.0000
0.0004
0.0004
0.0002
0.0001
Pendigits
0.0314
0.0207
0.1011
0.0197
0.0207
0.0225
0.0200
0.0293
0.0187
Sign
0.2420
0.2161
0.3435
0.1993
0.1802
0.2446
0.2510
0.2387
0.2176
Seermdl
0.2114
0.2100
0.2252
0.2077
0.2260
0.2150
0.2214
0.2144
0.2099
Magic
0.1252
0.1241
0.1898
0.1251
0.1244
0.1546
0.1595
0.1364
0.1340
Letter-recog
0.1032
0.0806
0.2133
0.0738
0.0782
0.0814
0.0877
0.1038
0.0730
Adult
0.1312
0.1220
0.1461
0.1230
0.1249
0.1459
0.1437
0.1297
0.1261
Shuttle
0.0008
0.0007
0.0024
0.0006
0.0007
0.0006
0.0007
0.0008
0.0007
Waveform
0.0152
0.0210
0.0199
0.0144
0.0137
0.0149
0.0157
0.0162
0.0177
Localization
0.3106
0.2134
0.4746
0.2137
0.1949
0.3010
0.3126
0.3112
0.2175
Experimental results of variance
Datasets
TAN
KDB
CFWNB
FKDB
SKDB
TAODE
IWAODE
TAN
KDB
Labor-negotiations
0.1389
0.1289
0.0320
0.1258
0.1137
0.0789
0.0626
0.1285
0.1146
Post-operative
0.1513
0.1697
0.0467
0.1730
0.1590
0.1563
0.1510
0.1297
0.1309
Zoo
0.0606
0.0658
0.0675
0.0536
0.0570
0.0445
0.0445
0.0574
0.0598
Wine
0.0493
0.0649
0.0042
0.0483
0.0341
0.0251
0.0141
0.0391
0.0459
Sonar
0.1165
0.1199
0.0432
0.1193
0.1097
0.0959
0.0929
0.1015
0.0191
Glass-id
0.1075
0.1189
0.0492
0.1065
0.1075
0.1004
0.0999
0.1077
0.1052
Hungarian
0.0596
0.0561
0.0201
0.0442
0.0366
0.0287
0.0270
0.0428
0.0495
Heart-disease-c
0.0479
0.0582
0.0389
0.0645
0.0489
0.0361
0.0305
0.0407
0.0561
Primary-tumor
0.2424
0.2391
0.2117
0.2419
0.2215
0.1880
0.1785
0.2363
0.2467
Ionosphere
0.0401
0.0581
0.0087
0.0563
0.0462
0.0381
0.0238
0.0389
0.0404
House-votes-84
0.0170
0.0197
0.0108
0.0222
0.0203
0.0081
0.0079
0.0250
0.0272
Soybean
0.0654
0.0439
0.0378
0.0541
0.0457
0.0331
0.0290
0.0461
0.0426
Credit-a
0.0555
0.0768
0.0205
0.0588
0.0380
0.0360
0.0276
0.0480
0.0510
Crx
0.0520
0.0663
0.0203
0.0501
0.0310
0.0310
0.0240
0.0421
0.0534
Tic-tac-toe
0.0824
0.1125
0.0550
0.0699
0.1417
0.0528
0.0529
0.1064
0.1034
German
0.1009
0.1192
0.0473
0.1216
0.0816
0.0789
0.0692
0.0903
0.0919
Yeast
0.1037
0.1020
0.1073
0.1014
0.0990
0.0972
0.0967
0.1002
0.0990
Mfeat-mor
0.1020
0.1031
0.0563
0.1010
0.1040
0.0730
0.0676
0.0289
0.0228
splice-c4.5
0.0289
0.0800
0.0068
0.1492
0.0133
0.0119
0.1250
0.0196
0.02457
Kr-vs-kp
0.0152
0.0111
0.0169
0.0191
0.0126
0.0208
0.0185
0.0108
0.0110
Dis
0.0005
0.0011
0.0050
0.0006
0.0016
0.0040
0.0036
0.0009
0.0009
Abalone
0.1693
0.1769
0.0904
0.1772
0.1679
0.1561
0.1539
0.1526
0.1409
Spambase
0.0158
0.0214
0.0054
0.0205
0.0240
0.0124
0.0094
0.0137
0.0106
Phoneme
0.1828
0.1064
0.0961
0.1349
0.0867
0.1355
0.1270
0.1388
0.0976
Page-blocks
0.0143
0.0177
0.0070
0.0169
0.0162
0.0122
0.0113
0.0138
0.0169
Optdigits
0.0185
0.0254
0.0137
0.0239
0.0253
0.0140
0.0132
0.0008
0.0007
Mushrooms
0.0002
0.0002
0.0001
0.0000
0.0002
0.0002
0.0001
0.0002
0.0002
Pendigits
0.0200
0.0236
0.0126
0.0265
0.0236
0.0130
0.0107
0.0190
0.0208
Sign
0.0386
0.0596
0.0250
0.0642
0.0725
0.0406
0.0380
0.0464
0.0482
Seermdl
0.0381
0.0613
0.0130
0.0692
0.0155
0.0295
0.0200
0.0362
0.0522
Magic
0.0490
0.0491
0.0092
0.0479
0.0483
0.0313
0.0291
0.0320
0.0409
Letter-recog
0.0591
0.0709
0.0498
0.0797
0.0745
0.0457
0.0417
0.0512
0.0651
Adult
0.0165
0.0285
0.0071
0.0252
0.0209
0.0174
0.0109
0.0122
0.0177
Shuttle
0.0004
0.0003
0.0006
0.0004
0.0004
0.0004
0.0003
0.0003
0.0003
Waveform
0.0053
0.0037
0.0004
0.0047
0.0088
0.0034
0.0024
0.0034
0.0045
Localization
0.0594
0.1099
0.0186
0.1096
0.1337
0.0657
0.0577
0.0321
0.0300
Experimental results of RMSE
Datasets
TAN
KDB
CFWNB
FKDB
SKDB
TAODE
IWAODE
TAN
KDB
Labor-negotiations
0.2778
0.2477
0.2810
0.1952
0.2153
0.2029
0.1739
0.2453
0.2243
Post-operative
0.5340
0.5632
0.3970
0.4499
0.4389
0.4184
0.4101
0.4113
0.4113
Zoo
0.1309
0.1815
0.0933
0.0659
0.0817
0.0686
0.0650
0.2368
0.0819
Wine
0.1746
0.1501
0.0532
0.1641
0.1194
0.1021
0.4689
0.2787
0.1275
Sonar
0.4131
0.4084
0.3409
0.4280
0.4100
0.4202
0.1001
0.4054
0.4005
Glass-id
0.3332
0.3395
0.2952
0.3241
0.3376
0.3409
0.4246
0.3273
0.3284
Hungarian
0.3429
0.3552
0.3384
0.3567
0.3434
0.3443
0.3237
0.3337
0.3284
Heart-disease-c
0.3775
0.3963
0.3417
0.3942
0.386
0.3677
0.345
0.3577
0.3656
Primary-tumor
0.7170
0.7262
0.1790
0.1841
0.1828
0.1864
0.3620
0.1773
0.1781
Ionosphere
0.2615
0.2714
0.2765
0.2612
0.2817
0.2464
0.1778
0.2518
0.2487
House-votes-84
0.2181
0.1969
0.2558
0.1977
0.2044
0.1968
0.2546
0.2128
0.2068
Soybean
0.2014
0.2063
0.0723
0.0734
0.0676
0.0657
0.1998
0.0591
0.0620
Credit-a
0.3415
0.3480
0.3116
0.3386
0.3346
0.3305
0.0697
0.3301
0.3260
Crx
0.3411
0.3525
0.3142
0.3289
0.3304
0.3322
0.3271
0.3245
0.3221
Tic-tac-toe
0.4023
0.3772
0.1116
0.0984
0.1030
0.0865
0.3259
0.0923
0.0933
German
0.4367
0.4665
0.4080
0.4567
0.4236
0.4224
0.3992
0.4206
0.4209
Yeast
0.5994
0.6035
0.2423
0.2387
0.2392
0.2371
0.4157
0.1292
0.2370
Mfeat-mor
0.4657
0.4707
0.1943
0.1978
0.1944
0.1980
0.2363
0.1936
0.1933
splice-c4.5
0.1917
0.2756
0.4334
0.2260
0.3268
0.3984
0.1979
0.4216
0.4193
Kr-vs-kp
0.2358
0.1869
0.2779
0.1935
0.1573
0.2561
0.2635
0.2418
0.2189
Dis
0.1103
0.1024
0.1130
0.1045
0.0987
0.1047
0.1058
0.1084
0.1041
Abalone
0.5638
0.5646
0.4433
0.4277
0.4269
0.4195
0.4191
0.4209
0.4222
Spambase
0.2403
0.2300
0.2657
0.2418
0.2298
0.2239
0.2317
0.2432
0.242
Phoneme
0.5048
0.4195
0.0806
0.0865
0.0756
0.0891
0.0795
0.0844
0.0762
Page-blocks
0.1894
0.1811
0.1117
0.1123
0.1020
0.1013
0.0986
0.1026
0.1031
Optdigits
0.1906
0.1806
0.1075
0.0761
0.0792
0.0727
0.0686
0.0803
0.0800
Mushrooms
0.0083
0.0001
0.0857
0.0017
0.0000
0.0121
0.0114
0.0288
0.0267
Pendigits
0.1640
0.1588
0.1318
0.0671
0.0687
0.0565
0.0540
0.1565
0.0700
Sign
0.3505
0.3334
0.3929
0.3333
0.3151
0.3487
0.3516
0.3521
0.3551
Seermdl
0.4131
0.4340
0.4121
0.4386
0.4083
0.4139
0.4106
0.4071
0.4069
Magic
0.3461
0.3470
0.3709
0.3410
0.3471
0.3519
0.3534
0.3541
0.3502
Letter-recog
0.3350
0.2963
0.1139
0.0764
0.0778
0.0691
0.0693
0.0842
0.0825
Adult
0.3076
0.3089
0.3150
0.3083
0.3043
0.3297
0.3250
0.3024
0.3014
Shuttle
0.0356
0.0290
0.0270
0.0127
0.0139
0.0124
0.0159
0.0147
0.0156
Waveform
0.1164
0.1402
0.0068
0.0198
0.0297
0.0198
0.0859
0.0529
0.2776
Localization
0.5656
0.5106
0.2402
0.1962
0.2010
0.2081
0.2093
0.2094
0.2105
