Abstract
It is important for Bayesian network (BN) structure learning, a NP-problem, to improve the accuracy and hybrid algorithms are a kind of effective structure learning algorithms at present. Most hybrid algorithms adopt the strategy of one heuristic search and can be divided into two groups: one heuristic search based on initial BN skeleton and one heuristic search based on initial solutions. The former often fails to guarantee globality of the optimal structure and the latter fails to get the optimal solution because of large search space. In this paper, an efficient hybrid algorithm is proposed with the strategy of two-stage searches. For first-stage search, it firstly determines the local search space based on Maximal Information Coefficient by introducing penalty factors
Keywords
Introduction
Bayesian network (BN) is a classical probabilistic model which uses graph pattern of variables’ probability to represent the causal relationships of uncertain events [10, 32], which has been applied to multiple tasks like fault diagnosis [13, 36], risk assessment [29, 21] and forecasting [11]. BN learning has two major components: the structure learning and the parameter learning, in which the structure learning is more primary and challenging than the parameter learning.
It has been proved that structure learning from data is a NP-hard problem[4]. Therefore, how to improve the accuracy of the structure learning is very important. Existing BN structure learning algorithms can be divided into three groups: constraint-based, scored-based and hybrid algorithms. Constraint-based algorithms use conditional independence (CI) tests to detect the relationships among variables [8, 19]. Scored-based algorithms regard the structure learning as optimization problems which use heuristic searches to look for the optimal structure with the highest score evaluated by scoring functions [33, 1]. Recently, hybrid algorithms of attempting to obtain merits of the two groups above by combining the constraint-based algorithms with the scored-based algorithms have been introduced and proven to be very effective algorithms to learn BN structure from data [12, 18].
Hybrid algorithms generally adopt the strategy of one heuristic search and can be categorized into two types which both exist the weaknesses. One classical type of hybrid algorithms uses constraint-based algorithms to construct an undirected graph (skeleton graph) and then uses scored-based algorithms to orientate edges for the undirected graph [28, 27]. However, the skeleton graph is sensitive to CI tests in constraint-based algorithms, which an error in the previous can have a cascading effect that causes many errors in the latter, leading to fail to get the whole skeleton graph and make BN structure learning in low accuracy. The other classical type of hybrid algorithms uses constraint-based algorithms to get initial solutions and then uses scored-based algorithms to search the whole solution space for the optimal solution [24, 9]. As the number of the candidate structures in the solution space grows exponentially with the count of nodes, it’s impossible to get the best solution, resulting BN structure learning in low accuracy. Therefore, we propose a strategy of hybrid algorithms that uses two-stage searches flexible to use appropriate search methods at different stages. In the first-stage search, it is crucial to determine the local search space which customarily adopts constrained algorithms in hybrid algorithms. As Maximal Information Coefficient (MIC) has two advantages of generality and equitability compared with traditional data analysis indicators such as Mutual Information [6], MIC is introduced into constraint-based algorithms and algorithms based on MIC [38] show the excellent performance. The process of eliminating triangular loops is an important part in algorithms based on MIC. The process of eliminating triangular loops in algorithms based on MIC traditionally uses the condition independence criterion [38, 37]. Nevertheless, it is easy to lead the lower accuracy because the triangular loops that cannot be removed are eliminated by the method of randomly deleting an edge in each triangular. Wei et al.[39] proposes a method to eliminate triangular loops by introducing one penalty factor. However, this method only uses one penalty factor to penalize triangular loops in one case and ignores another case that actually exists. In addition, this method may leave unnecessary triangular loops. Therefore, we propose the method by introducing penalty factors
Focusing on these issues, in this paper, we propose a hybrid algorithm using the strategy of two-stage searches. The algorithm consists of main two parts: the first-stage search and the second-stage search. For the first-stage search, it can be split into two steps. The first step is to use the algorithm based on MIC introducing penalty factors
The structure of the paper is as follows. The work relative to BN learning algorithms is reviewed in Section 2. In Section 3, the proposed hybrid algorithm named pMIC_BPSO_ADR is described in detail. The experimental results with simulations are described in Section 4. Finally, the conclusions are presented in Section 5.
Related work
Bayesian networks
Bayesian networks can be represented as a tuple
The structure of Bayesian networks. The parameter of Bayesian networks.
A simple Bayesian network.
An example Bayesian network is showed in Fig. 1. According to Fig. 1, it is clear that an BN is made up of the qualitative part representing a DAG and the quantitative part representing a joint probability distribution composed of a set of conditional probability distributions. Therefore, the joint probability distribution can be factorized as the product of conditional probability distributions of each node:
Structure learning can be regarded as a combinatorial optimization problem and the size of search space, namely the number of possible BN structures is proved to be a function related with nodes
Obviously, the number of possible BN structures increases exponentially with the number of nodes. And it has now been demonstrated to an NP-hard problem. As mentioned above, constraint-based, Scored-based and hybrid algorithms are three main structure learning algorithms. In this paper, we are interested in hybrid algorithms because of the reasons mentioned above.
The hybrid algorithms combine constraint-based with Scored-based algorithms to learn structure. Constraint-based algorithms adopted in hybrid algorithms generally adopt information theory or CI tests [10, 9, 17, 34, 38, 37, 39]. Recent years, with MIC capturing wider associations than MI [6], algorithms based on MIC [38, 37] show the excellent performance. It will be explained in Section 2.2.1. Scored-based algorithms adopted in hybrid algorithms use heuristic searches to find the optimal structure in search space. Heuristic searches such as Genetic Algorithms (GA), Ant Colony Optimization, BPSO have been well applied to structure learning. Comparing with some heuristic searches, BPSO is not only easy to implement and has few parameters, but also has faster convergence speed [2]. It will be explained in Section 2.2.2.
MIC is a recently proposed data analysis indicator to capture associations between node pairs , which has two advantages of generality and equitability compared with traditional data analysis indicators such as Pearson correlation coefficient, Spearman correlation coefficient, mutual information [6, 30].
The two variables, X and Y, which have correlation between them, have
where
A grid partition in dataset D.
Particle Swarm Optimization (PSO) is one of the most widely used swarm intelligence algorithms at present [40, 25] and uses a large number of particles (candidate solutions) to search in the d-dimensional search space. Each particle updates its states including the position and velocity by tracking two extremes: personal best solution (pbest) and global best solution (gbest) to find the best solution.
The states of the
where
Traditional PSO algorithms can only deal with the continuous space, and methods that covert PSO algorithms for continuous problems to PSO algorithms for binary discrete problems can be divided into two types:
Based on the traditional PSO algorithms for continuous problems, the particle velocity update function is retained and the particle position update function is modified to satisfy the binary space [22]. Based on the essential principle of PSO algorithms for continuous problems, the particle velocity update function and particle position update function are refined in binary discrete space [7].
In this paper, we choose the first processing methods that retain the advantages of simple calculation in traditional PSO algorithms and have fast calculation speed.
The first-stage search
The first-stage search is divided into two steps. The first step to determine the local search space and the second step is to use the BPSO algorithm to search the local search space.
Determine the local search space based on MIC with penalty factors
The scheme for the local search space based on MIC with penalty factors determination.
Motivated by [38], the framework of determining the local search space based on MIC with penalty factors is described in Fig. 3. Our work is introduced penalty factors in step 1 of Fig. 3 to eliminate invalid triangular loops. In step 1, the undirected structure is constructed by the MIC in any two nodes
Where
The undirected structure.
According to Wei et al.[39], the idea is to eliminate triangular loops caused by the high MIC between two nodes related through one intermediate children by introducing a penalty factor. Figure 4a displays the actual undirected structure that MIC(X,Y) is high because nodes X and Y are connected by node
In step 1, the threshold value
The details of rest steps are described in [38]. After step 4, we obtain the initial BN structure
Comparing with some heuristic searches, BPSO is not only easy to implement and has few parameters, but also has faster convergence speed [2]. BPSO with those advantages is very suitable for the first-stage search in proposed algorithm.
1. Representation of a BN structure
A BN structure can be expressed as an
And the particle can be expressed a string:
A schematic diagram of representation of a BN structure.
2. Initialize particle swarms
After constructing the initial BN structure which has high accuracy, we use BPSO for the first-stage search to search for the undirected part.
Each particle is initialized as the following method which can be divided into two steps. The first step is to randomly initialize the set of undirected edges in the initial BN structure by upper triangle coding. The second step is to make the initialized particle valid. The procedure of initializing a particle in detail is shown in Fig. 6. The initial BN structure which is the output of the above algorithm based MIC with penalty factors is essentially a partially directed acyclic graph, namely
A schematic diagram of initializing a particle.
A particle with 5-dimension is taken as an example to illustrate the whole process.
The adjacent matrix with 5
The disposal of the initial BN structure in step 1.
The left part of Fig. 8 reveals that the set of directed edges
A schematic diagram of adjacent matrix after transferring.
We find an illegal loop in adjacent matrix
A schematic diagram of removing an illegal loop.
3. Obtain the optimal DAG
1) Particle velocity and position updating formulas
In our proposed method, the particle velocity updating formula adopts Eq. (4) and the particle position updating formula [35] is written as follow:
where
2) The ‘repair’ strategy to obtain legal structures and the fitness function
The ‘repair’ strategy is used to translate illegal structures into legal structures. The illegal structures are classified as three types: self-cycles, bi-cycles and regular-cycles and then are processed by the ’repair’ strategy according to [16].
The fitness function in BPSO corresponds to the scoring function in BN structure learning. In our proposed method, the Bayesian Information Criterion (BIC) scoring function is chosen as the fitness function to evaluate the quality of candidate solutions. Algorithm 1 shows the pseudo-code of using BPSO for using BPSO algorithm to search in the local space.
BPSO algorithm for the local BN structure[1] the Initial BN structure
Compared with the actual network structure, the structure after the first-stage search has more missing edges, fewer redundant and inverted edges. The reason is that we control the false negative rate in the first-stage search. This means that we often obtain the local structure (local space accouts for about 70% of the whole space demonstrated in Section 4.2.2) from the first-stage search and the second-stage search should focus on how to extend the local space to the whole space and then take the errors of the first-stage search into account. So we need a more targeted approach for the second-stage search, an operator-based approach.
The scored-based algorithms are essentially an optimization process based on three basic operators including add operator, delete operator and reverse operator [23]. There are some literatures on operator-based approach, such as Greedy Equivalent Search (GES) [5], the greedy hill-climbing search [26]. The key is how to combine the operators, such as the number of operators, the combination ways of operators. We prefer to use the add operator and then use other two basic operators. Aimed at this case, the proposed search algorithm for the second-stage search based on three operators used in order of add operator, delete operator and reverse operator. We denote the proposed search algorithm as ADR (the abbreviation of
There are three basic operators in Bayesian network structure learning which are add operator, delete operator and reverse operator. What we have to strengthen is that although the reverse operator can be equivalent to the combination of one delete operator and one add operator, the reverse operator is an indispensable operator in BN structure learning. And an example of using these three basic operators is shown in Fig. 10.
In ADR algorithm, three operators are executed sequentially in each cycle, and each operator is selected which is legal and makes the score of the BN structure increases highest until the score does not increase. The three operators are carried out in the order of add operator, reverse operator and delete operator. The reason for add operator before reverse operator is that the direction of edge added can be reversed by the subsequent reverse operator if the direction of the edge added is wrong. The reason for reverse operator before delete operator is that the edge reversed or added can be deleted by the subsequent delete operator if the existence of the edge is wrongly judged.
Algorithm 2 reveals the pseudo-code of using ADR algorithm for the second-stage search.
ADR algorithm for the second-stage search[1] the BN structure after the first-stage search
Experimental results and discussion
Experimental design
The proposed algorithm is implemented in following simulation environments including hardware and software environments. The PC used in the hardware environment is with an operating system of Windows 10, a CPU of Intel 3.40 GHz and a computer memory of 8 G. The software environment is in MATLAB2016b with a BN toolbox FullBNT-1.0.7, the graph toolbox matgraph-2.0 and the maximum information coefficient toolbox minepy-1.2.1.
In this paper, we use two universally representative benchmarks of BNs including Alarm network and Asia network to verify the performance of our proposed algorithm. The Alarm network has three versions [17] and the version we use is described in
Bayesian networks used in simulation experiments
Bayesian networks used in simulation experiments
The simulation experiments can be divided into two parts. In Section 4.2, we verify the effectiveness of the method in determining the local search space based on MIC with penalty factors by comparing with other two methods and carry on the detailed analysis of the performance for this proposed method. To achieve this goal, we measure the error severity, the searching ability, respectively from the perspective of the precision and recall, and the combination of both. The details are shown in Table 2.
Definition of evaluation criteria for the local structure
A schematic diagram of three operators.
Explain the Table 2 as follows.
The number of correct links: The number of same links in the learned structure and the original structure from the network skeleton. The number of wrong links: The number of redundant links in the learned structure compared with the original structure from the network skeleton. The number of correct edges: The number of same edges in the learned structure and the original structure. The number of wrong edges: The number of inverted edges in the learned structure compared with the original structure. Average number of errors: The mean value of the sum of the number of wrong links and wrong edges in N simulation results. Average number of searching: The mean value of the sum of the number of correct links and correct edges in N simulation results.
In Section 4.3, we evaluate the performance of pMIC_BPSO_ADR algorithm from the goodness of fit to data, the quality of the network structure itself and time complexity. Measure the algorithm by following three indicators described below:
Structural differences Missing edges (ME): The number of directed edges which don’t exist in the learned structure but exist in the original structure. Redundant edges (RE): The number of directed edges which only exist in the learned structure but don’t exist in the original structure. Inverted edges (IE): The number of edges which are determined correctly but with wrong direction. Correct edges (CE): The number of directed edges which are correctly determined. Score value The value of the Bayesian Information Criterion (BIC) scoring function. The BIC scoring function is usually used to measure the quality of BN structure which has the compromise between complexity of the model and the goodness of the model with given data. Running time Running time is used to evaluate the complexity of methods. As previously described, all methods are executed in the above simulation environments.
In this paper, the mean result and the best result are also used and explained as follows:
Mean Result (MR): It is the result averaged over N runs of the algorithm on N different and independent datasets for each network structure. Best Results (BR): It is the best result of the data.
Comparative experimental analysis
In order to verify the effectiveness of the method on eliminating triangle loops by introducing penalty factors
The abscissa is datasets in figures, and the ordinate is SC indexes which represent differences between the two methods on eliminating the triangle loops by introducing the penalty factors with the traditional method on eliminating the triangle loops. The bigger the SC index is, the more effective the structure learning is. According to Figs 11 and 12, it is easy to reveal that methods on eliminating the triangle loops by introducing the penalty factors are better than the traditional method on eliminating the triangle loops and our proposed method introducing penalty factors
Algorithm performance analysis
Tables 3 and 4 display the results of experiments in Alarm and Asia network.
Evaluation criteria of the inital BN structure for Alarm network
Evaluation criteria of the inital BN structure for Alarm network
Evaluation criteria of the inital BN structure for Asia network
Comparison of the average SC index under different methods for Asia network.
Comparison of the average SC index under different methods for Alarm network.
According to Table 3 and 4, we can easily find that the average error rate of links, average error rate of edges for Alarm network and Asia network are very low, especially for Alarm network with more nodes. These two evaluation criteria indicate that the links and edges found by our proposed method are basically correct, and the redundant edges and reverse edges are very low.
Based on Tables 3 and 4, it is shown that the average searching rate of links and average searching rate of edges are relatively high. The lowest average searching rate of links shown is above 72% and the lowest average searching rate of edges shown is above 32%. We can see that the determined search space of local search accounts for more than 70% of the whole search space.
The low error rate and good searching ability play a key role for the strategy of two-stage heuristic searches.
In order to verify the effectiveness of the proposed method pMIC_BPSO_ADR, we compare the our proposed method with pMIC_BPSO and TPBM[3]. pMIC_BPSO represents a hybrid algorithm using the traditional strategy. TPBM represents an improved structure learning algorithm in recent years.
Comparison of structural differences
In this experiment, we compare three methods by calculating the structural differences. The smaller ME, RE and IE in structure differences are, the better the BN structure learned by the algorithm is. The bigger CE in structure differences is, the better the BN structure learning is. The experimental results for different networks are indicated in Figs 13 and 14.
Comparison of structural differences under different methods for Asia network.
In these figures, the horizontal coordinates depict different datasets and the vertical coordinates indicate kinds of structural differences. According to Figs 13 and 14, we can easily find that three methods’ ME, IE, RE values and the learned structure by our proposed method is better than other two methods by the comprehensive criterion CE.
In this experiment, we compare three methods by calculating the BIC scores. The greater the BIC score is, the better the BN structure learning is. The experimental results for different networks are indicated in Tables 5 and 6.
Comparison of BIC scores under different methods for Asia network
Comparison of BIC scores under different methods for Asia network
Comparison of structural differences under different methods for Alarm network.
According to Tables 5 and 6, we can easily find that our proposed method can get greater BIC scores with other two methods. Therefore, we can indicated that the learned structure by our proposed method is better than other two methods.
Comparison of BIC scores under different methods for Alarm network
In this experiment, we compare three methods by measuring the running time. The shorter running time is, the lower the complexity is and the better performance of the algorithm is. The experimental results for different networks are indicated in Figs 15 and 16.
Comparison of running time under different methods for Asia network.
Two figures show the average running time of three methods on two networks to reach the best structure. The horizontal coordinate depicts different datasets and the vertical coordinate depicts average running time. Based on Figs 15 and 16, we can find that the proposed method has obviously better time performance than pMIC_BPSO, and has lower running time than TPBM in most cases. From Figs 13 and 14, Tables 5 and 6, our proposed method can obtain better structural differences and BIC scores. Moreover, our proposed method can get considerable better time performance when the data set is large. This means that the proposed method has a good advantage of the ability to handle large data sets.
Comparison of running time under different methods for Alarm network.
The current hybrid algorithms generally adopt the strategy of one heuristic search and can be divided into two types which both exist the weaknesses. Therefore, in this paper, we propose a new strategy of hybrid algorithms using two-stage searches which are flexible use of appropriate search methods at different stages. In the first-stage search, we propose the method based on MIC introducing penalty factors to eliminate triangular loops. The experiments show that the improved method can get a better performance. In the second-stage search, we propose the ADR algorithm considering the differences between the local space and the true whole space. In order to verify the effectiveness of the whole proposed algorithm, we do simulation experiments to demonstrate that our proposed algorithm can obtain better performance of BN structure learning.
Footnotes
Acknowledgments
The authors would like to thank the editor and the anonymous reviewers for their insightful comments and suggestions. This study was supported by the National Key R&D Program of China (2017YFB0304205), the National Natural Science Foundation of China (61973067), and the Open Research Fund from the State Key Laboratory of Rolling and Automation, Northeastern University (2019RALKFKT004).
