An efficient Bayesian network structure learning algorithm using the strategy of two-stage searches

Abstract

It is important for Bayesian network (BN) structure learning, a NP-problem, to improve the accuracy and hybrid algorithms are a kind of effective structure learning algorithms at present. Most hybrid algorithms adopt the strategy of one heuristic search and can be divided into two groups: one heuristic search based on initial BN skeleton and one heuristic search based on initial solutions. The former often fails to guarantee globality of the optimal structure and the latter fails to get the optimal solution because of large search space. In this paper, an efficient hybrid algorithm is proposed with the strategy of two-stage searches. For first-stage search, it firstly determines the local search space based on Maximal Information Coefficient by introducing penalty factors $p_{1}$ , $p_{2}$ , then searches the local space by Binary Particle Swarm Optimization. For second-stage search, an efficient ADR (the abbreviation of Add, Delete, Reverse) algorithm based on three basic operators is designed to extend the local space to the whole space. Experiment results show that the proposed algorithm can obtain better performance of BN structure learning.

Keywords

Bayesian network structure learning hybrid algorithms penalty factors binary particle swarm optimization algorithm ADR algorithm

1. Introduction

Bayesian network (BN) is a classical probabilistic model which uses graph pattern of variables’ probability to represent the causal relationships of uncertain events [10, 32], which has been applied to multiple tasks like fault diagnosis [13, 36], risk assessment [29, 21] and forecasting [11]. BN learning has two major components: the structure learning and the parameter learning, in which the structure learning is more primary and challenging than the parameter learning.

It has been proved that structure learning from data is a NP-hard problem[4]. Therefore, how to improve the accuracy of the structure learning is very important. Existing BN structure learning algorithms can be divided into three groups: constraint-based, scored-based and hybrid algorithms. Constraint-based algorithms use conditional independence (CI) tests to detect the relationships among variables [8, 19]. Scored-based algorithms regard the structure learning as optimization problems which use heuristic searches to look for the optimal structure with the highest score evaluated by scoring functions [33, 1]. Recently, hybrid algorithms of attempting to obtain merits of the two groups above by combining the constraint-based algorithms with the scored-based algorithms have been introduced and proven to be very effective algorithms to learn BN structure from data [12, 18].

Hybrid algorithms generally adopt the strategy of one heuristic search and can be categorized into two types which both exist the weaknesses. One classical type of hybrid algorithms uses constraint-based algorithms to construct an undirected graph (skeleton graph) and then uses scored-based algorithms to orientate edges for the undirected graph [28, 27]. However, the skeleton graph is sensitive to CI tests in constraint-based algorithms, which an error in the previous can have a cascading effect that causes many errors in the latter, leading to fail to get the whole skeleton graph and make BN structure learning in low accuracy. The other classical type of hybrid algorithms uses constraint-based algorithms to get initial solutions and then uses scored-based algorithms to search the whole solution space for the optimal solution [24, 9]. As the number of the candidate structures in the solution space grows exponentially with the count of nodes, it’s impossible to get the best solution, resulting BN structure learning in low accuracy. Therefore, we propose a strategy of hybrid algorithms that uses two-stage searches flexible to use appropriate search methods at different stages. In the first-stage search, it is crucial to determine the local search space which customarily adopts constrained algorithms in hybrid algorithms. As Maximal Information Coefficient (MIC) has two advantages of generality and equitability compared with traditional data analysis indicators such as Mutual Information [6], MIC is introduced into constraint-based algorithms and algorithms based on MIC [38] show the excellent performance. The process of eliminating triangular loops is an important part in algorithms based on MIC. The process of eliminating triangular loops in algorithms based on MIC traditionally uses the condition independence criterion [38, 37]. Nevertheless, it is easy to lead the lower accuracy because the triangular loops that cannot be removed are eliminated by the method of randomly deleting an edge in each triangular. Wei et al.[39] proposes a method to eliminate triangular loops by introducing one penalty factor. However, this method only uses one penalty factor to penalize triangular loops in one case and ignores another case that actually exists. In addition, this method may leave unnecessary triangular loops. Therefore, we propose the method by introducing penalty factors $p_{1}$ , $p_{2}$ and using the condition independence criterion to eliminate the triangular loops .

Focusing on these issues, in this paper, we propose a hybrid algorithm using the strategy of two-stage searches. The algorithm consists of main two parts: the first-stage search and the second-stage search. For the first-stage search, it can be split into two steps. The first step is to use the algorithm based on MIC introducing penalty factors $p_{1}$ , $p_{2}$ to determine the local search space, and the second step is to use Binary Particle Swarm Optimization (BPSO) algorithm which is not only easy to implement and has few parameters but also has faster convergence speed to determine the local BN structure quickly and precisely. For the second-stage search, considering the fact that the local structure compared with the real whole BN structure has more missing edges, fewer redundant and inverted edges because of controlling the false negative rate in the first-stage search, we propose the ADR (the abbreviation of Add, Delete, Reverse) algorithm which is more accordant with practical circumstances compared with operator-based approaches [5, 26]. Thus, we denote our proposed algorithm as pMIC_BPSO_ADR. According to experimental results, the proposed hybrid algorithm can achieve better performance of structure learning from the goodness of fit to data, the quality of the network structure itself and time complexity. The main work and innovation points lie in the following: (1) it introduces the strategy of two-stage searches in hybrid algorithms which adopt appropriate search methods according to different stages, (2) it introduces the penalty factors $p_{1}$ , $p_{2}$ to solve the triangular loops in local search space, (3) it introduces the ADR algorithm as the second-stage search method.

The structure of the paper is as follows. The work relative to BN learning algorithms is reviewed in Section 2. In Section 3, the proposed hybrid algorithm named pMIC_BPSO_ADR is described in detail. The experimental results with simulations are described in Section 4. Finally, the conclusions are presented in Section 5.

2. Related work

2.1 Bayesian networks

Bayesian networks can be represented as a tuple $\textit{BN}=(G,\theta)$ which is composed of two components:

•
The structure of Bayesian networks. $G=(V,E)$ is a directed acyclic graph (DAG) and consists of two parts: $V$ and $E$ . $V=\left\{X_{1}\ldots X_{n}\right\}$ represents a set of nodes. $E=\left\{\langle V_{1},V_{2}\rangle\ldots\langle V_{i},V_{j}\rangle\right\}$ represents a set of directed edges between node pairs and the direction of each edge is from the former node to the latter node.
•
The parameter of Bayesian networks. $\theta=\left\{\theta_{1}\ldots\theta_{n}\right\}$ represents a set of conditional probability distributions in BN nodes, such that $\theta_{i}=P\left(X_{i}|\textit{Pa}\left(X_{i}\right)\right)$ denotes the conditional probability distribution of node $X_{i}$ given the set of parent nodes $\textit{Pa}\left(X_{i}\right)$ .

Figure 1.
A simple Bayesian network.

An example Bayesian network is showed in Fig. 1. According to Fig. 1, it is clear that an BN is made up of the qualitative part representing a DAG and the quantitative part representing a joint probability distribution composed of a set of conditional probability distributions. Therefore, the joint probability distribution can be factorized as the product of conditional probability distributions of each node:

$\displaystyle P\left(X_{1}\ldots X_{n}\right)=\prod_{i=1}^{n}P\left(X_{i}|% \textit{Pa}\left(X_{i}\right)\right)$ (1)
2.2 Structure learning of BN

Structure learning can be regarded as a combinatorial optimization problem and the size of search space, namely the number of possible BN structures is proved to be a function related with nodes $n$ given by the following recurrence formula [31]:

$\displaystyle f\left(n\right)=\sum_{i=1}^{n}\left(-1\right)^{i+1}\dbinom{n}{i}% 2^{i\left(n-i\right)}f\left(n-i\right),f\left(0\right)=f\left(1\right)=1$ (2)

Obviously, the number of possible BN structures increases exponentially with the number of nodes. And it has now been demonstrated to an NP-hard problem. As mentioned above, constraint-based, Scored-based and hybrid algorithms are three main structure learning algorithms. In this paper, we are interested in hybrid algorithms because of the reasons mentioned above.

The hybrid algorithms combine constraint-based with Scored-based algorithms to learn structure. Constraint-based algorithms adopted in hybrid algorithms generally adopt information theory or CI tests [10, 9, 17, 34, 38, 37, 39]. Recent years, with MIC capturing wider associations than MI [6], algorithms based on MIC [38, 37] show the excellent performance. It will be explained in Section 2.2.1. Scored-based algorithms adopted in hybrid algorithms use heuristic searches to find the optimal structure in search space. Heuristic searches such as Genetic Algorithms (GA), Ant Colony Optimization, BPSO have been well applied to structure learning. Comparing with some heuristic searches, BPSO is not only easy to implement and has few parameters, but also has faster convergence speed [2]. It will be explained in Section 2.2.2.

2.2.1 Maximum information coefficient

MIC is a recently proposed data analysis indicator to capture associations between node pairs , which has two advantages of generality and equitability compared with traditional data analysis indicators such as Pearson correlation coefficient, Spearman correlation coefficient, mutual information [6, 30].

The two variables, X and Y, which have correlation between them, have $\left|X\right|$ -values and $\left|Y\right|$ -values respectively, forming a grid $G r$ with the size of $\left|X\right|\times\left|Y\right|$ . The maximum mutual information between random variables X and Y over different grid partitions in dataset D is $I_{|X|,|Y|}^{*}(X,Y|D)=max_{Gr}I(X,Y|D)$ . Figure 2 shows an example of the grid partition in dataset D, where $|X|=2,|Y|=2$ . Blue dots represent data and are divided into nine sections by this grid. MIC can be written as [15]:

$\displaystyle\textit{MIC}(X,Y|D)=\max\limits_{|X|\times|Y|<B(N)}\left\{\frac{I% _{|X|,|Y|}^{*}(X,Y|D)}{\textit{log}\min\left\{|X|,|Y|\right\}}\right\}$ (3)

where $N$ represents the sample size and $B(N)$ taken $N^{0.6}$ in gerneral is the upper limit value of the grid size.

Figure 2.

A grid partition in dataset D.

2.2.2 Binary particle swarm optimization

Particle Swarm Optimization (PSO) is one of the most widely used swarm intelligence algorithms at present [40, 25] and uses a large number of particles (candidate solutions) to search in the d-dimensional search space. Each particle updates its states including the position and velocity by tracking two extremes: personal best solution (pbest) and global best solution (gbest) to find the best solution.

The states of the $i t h$ particle are represented by d-dimensional positon $X_{i}=(X_{i1}\ldots X_{id})$ and velocity $V_{i}=(V_{i1}\ldots V_{id})$ , and each particle updates its states according to the following formulas:

$\displaystyle V_{id}^{k+1}=w\times V_{id}^{k}+c_{1}\times r_{1d}^{k}\times(% \textit{pbest}_{id}^{k}-X_{id}^{k})+c_{2}\times r_{2d}^{k}\times(\textit{gbest% }_{d}^{k}-X_{id}^{k})$ (4) $\displaystyle X_{id}^{k+1}=X_{id}^{k}+V_{id}^{k+1}$ (5)

where $V_{id}^{k}$ denotes the velocity of particle $i$ in the dimension $d$ at iteration $k$ , $X_{id}^{k}$ represents the position of particle $i$ in the dimension $d$ at iteration $k$ , $w$ is the inertia weight, $c_{1}$ and $c_{2}$ indicate acceleration coefficients to adjust the maximum learning step size, $r_{1d}^{k}$ and $r_{2d}^{k}$ are random numbers in the interval $[0,1]$ , $\textit{pbest}_{id}^{k}$ represents the best solution obtained by the particle $i$ so far, $\textit{gbest}_{d}^{k}$ is the best solution of all particles so far.

Traditional PSO algorithms can only deal with the continuous space, and methods that covert PSO algorithms for continuous problems to PSO algorithms for binary discrete problems can be divided into two types:

Based on the traditional PSO algorithms for continuous problems, the particle velocity update function is retained and the particle position update function is modified to satisfy the binary space [22].

Based on the essential principle of PSO algorithms for continuous problems, the particle velocity update function and particle position update function are refined in binary discrete space [7].

In this paper, we choose the first processing methods that retain the advantages of simple calculation in traditional PSO algorithms and have fast calculation speed.

3. Learning BN structure using pMIC_BPSO_ADR algorithm

3.1 The first-stage search

The first-stage search is divided into two steps. The first step to determine the local search space and the second step is to use the BPSO algorithm to search the local search space.

3.1.1 Determine the local search space based on MIC with penalty factors

Figure 3.

The scheme for the local search space based on MIC with penalty factors determination.

Motivated by [38], the framework of determining the local search space based on MIC with penalty factors is described in Fig. 3. Our work is introduced penalty factors in step 1 of Fig. 3 to eliminate invalid triangular loops. In step 1, the undirected structure is constructed by the MIC in any two nodes $\textit{MIC}(X_{i},X_{j})$ in [38] while by the MIC introducing the penalty factors $p_{1},p_{2}$ of any two nodes $\textit{MIC}_{p_{1},p_{2}}\left(X_{i},X_{j}\right)$ in our work. $\textit{MIC}_{p_{1},p_{2}}\left(X_{i},X_{j}\right)$ is calculated by the following equation

$\displaystyle\textit{MIC}_{p_{1},p_{2}}\left(X_{i},X_{j}\right)=\textit{MIC}(X% _{i},X_{j})-\frac{p_{1}}{2N_{\textit{before}}}\sum_{h=1}^{N_{\textit{before}}}% (\textit{MIC}(X_{i},W_{b})+\textit{MIC}(X_{j},W_{b}))-\frac{p_{2}}{2N_{\textit% {after}}}\sum_{h=1}^{N_{\textit{after}}}(\textit{MIC}(X_{i},W_{a})+\textit{MIC% }(X_{j},W_{a}))$ (6)

Where $p_{1}=$ 0.1 and $p_{2}=$ 1. The MIC values between node $X_{i}$ and any other node are arranged from big to small, where the nodes before node $X_{j}$ are called pre- $X_{j}$ and the nodes after node $X_{j}$ are called aft- $X_{j}$ . $W_{b}$ denotes the same nodes in pre- $X_{i}$ and pre- $X_{j}$ and $N_{\textit{before}}$ denotes the number of the same nodes. $W_{a}$ denotes the same nodes in aft- $X_{i}$ and aft- $X_{j}$ and $N_{\textit{after}}$ denotes the number of the same nodes.

Figure 4.

The undirected structure.

According to Wei et al.[39], the idea is to eliminate triangular loops caused by the high MIC between two nodes related through one intermediate children by introducing a penalty factor. Figure 4a displays the actual undirected structure that MIC(X,Y) is high because nodes X and Y are connected by node $A$ , $B$ and $C$ , resulting the incorrect triangular loop generated in Fig. 4b. This method considers that the wrong directed connection between the two nodes is due to the large MIC values of one intermediate nodes with the two nodes. In the actual situation, the wrong directed connection between two nodes is often caused by the synthetic action of one intermediate nodes that has small MIC values with two nodes. Therefore, our proposed method introduces the penalty factors $p_{1}$ , $p_{2}$ . The penalty factor $p_{1}$ punishes the one intermediate nodes that has high MIC values with two nodes and the penalty factor $p_{2}$ punishes the one intermediate nodes through the synthetic action.

In step 1, the threshold value $\alpha$ in [38] is used for controlling the difficulty that undirected edges access to the BN structure. In our work, $\alpha$ sets to 0.9 to strictly control to reduce the false negative rate (false missing undirected edges rate).

The details of rest steps are described in [38]. After step 4, we obtain the initial BN structure $G_{1}$ , which is the local search space.

3.1.2 Use BPSO algorithm to search in local search space

Comparing with some heuristic searches, BPSO is not only easy to implement and has few parameters, but also has faster convergence speed [2]. BPSO with those advantages is very suitable for the first-stage search in proposed algorithm.

1. Representation of a BN structure

A BN structure can be expressed as an $n\times n$ adjacent matrix $A$ , and the element $a_{ij}$ in adjacent matrix $A$ is defined as follow:

$\displaystyle a_{ij}=\left\{\begin{array}[]{ll}1&i=\textit{Pa(j)}\\ 0&\textit{otherwise}\\ \end{array}\right.$ (7)

And the particle can be expressed a string: $a_{11}a_{12}\ldots a_{1n}a_{21}a_{22}\ldots a_{n1}\ldots a_{nn}$ . An example of the representation of a BN structure is shown in Fig. 5.

Figure 5.

A schematic diagram of representation of a BN structure.

2. Initialize particle swarms

After constructing the initial BN structure which has high accuracy, we use BPSO for the first-stage search to search for the undirected part.

Each particle is initialized as the following method which can be divided into two steps. The first step is to randomly initialize the set of undirected edges in the initial BN structure by upper triangle coding. The second step is to make the initialized particle valid. The procedure of initializing a particle in detail is shown in Fig. 6. The initial BN structure which is the output of the above algorithm based MIC with penalty factors is essentially a partially directed acyclic graph, namely $G_{1}$ . We only randomly orientate the set of undirected edges $U$ to the set of directed edges $D_{2}$ by upper triangle coding. The initialized particle needs to be transferred to an adjacent matrix in order to make the initialized particle valid. The adjacent matrix transferred from the set of directed edges $D_{1}$ completes the adjacent matrix $Z_{1}$ to obtain the adjacent matrix $Z_{2}$ . An illegal loop is looked for by DFS (Depth First Search for short) and handled by randomly reversing an edge $X\in D_{2}$ until there is no illegal loops.

Figure 6.

A schematic diagram of initializing a particle.

A particle with 5-dimension is taken as an example to illustrate the whole process.

The adjacent matrix with 5 $\times$ 5 represented by the initial BN structure is shown in the left part of Fig. 7. The middle part of Fig. 7 shows that the adjacent matrix are divided into a set of directed edges $D_{1}=\left\{\langle 1,3\rangle,\langle 4,3\rangle,\langle 5,1\rangle\right\}$ and a set of undirected edges $U=\left\{\left(1,2\right),\left(2,4\right),\left(2,5\right),\left(4,5\right)\right\}$ . The right part of Fig. 7 displays that the set of undirected edges $U$ randomly orientates by upper triangle coding. The result of upper triangle coding indicates four undirected edges are randomly oriented to: $1\to 2,2\leftarrow 4,2\to 5,4\leftarrow 5$ and the set of directed edges $D_{2}=\left\{\langle 1,2\rangle,\langle 4,2\rangle,\langle 2,5\rangle,\langle 5% ,4\rangle\right\}$ is obtained.

Figure 7.

The disposal of the initial BN structure in step 1.

The left part of Fig. 8 reveals that the set of directed edges $D_{2}$ transfers the adjacent matrix $Z_{1}$ . Add the part of the set of directed edges $D_{1}$ transfered to the adjacent matrix $Z_{1}$ and we can get the adjacent matrix $Z_{2}$ shown in the right part of Fig. 8.

Figure 8.

A schematic diagram of adjacent matrix after transferring.

We find an illegal loop in adjacent matrix $Z_{2}$ which is highlighted with cyan in the left part of Fig. 9 by DFS. The edge found in the illegal loop is marked in grey in the middle part of Fig. 9. And then the edge is reversed marked in grey in the right part of Fig. 9.

Figure 9.

A schematic diagram of removing an illegal loop.

3. Obtain the optimal DAG

1) Particle velocity and position updating formulas

In our proposed method, the particle velocity updating formula adopts Eq. (4) and the particle position updating formula [35] is written as follow:

$\displaystyle X_{id}^{k+1}=\left\{\begin{array}[]{ll}(V_{id}^{k})^{-1}&\textit% {rand}<T(v_{id}^{k+1})\\ V_{id}^{k}&\textit{rand}\geqslant T(v_{id}^{k+1})\\ \end{array}\right.$ (8)

where $(V_{id}^{k})^{-1}$ is the complementary set of $V_{id}^{k}$ and $T$ is the transfer function which is stated as follows:

$\displaystyle T(V_{id}^{k})=\left|{\frac{2}{\pi}\arctan(V_{id}^{k})}\right|$ (9)

2) The ‘repair’ strategy to obtain legal structures and the fitness function

The ‘repair’ strategy is used to translate illegal structures into legal structures. The illegal structures are classified as three types: self-cycles, bi-cycles and regular-cycles and then are processed by the ’repair’ strategy according to [16].

The fitness function in BPSO corresponds to the scoring function in BN structure learning. In our proposed method, the Bayesian Information Criterion (BIC) scoring function is chosen as the fitness function to evaluate the quality of candidate solutions. Algorithm 1 shows the pseudo-code of using BPSO for using BPSO algorithm to search in the local space.

BPSO algorithm for the local BN structure[1] the Initial BN structure $G_{1}$ ; the particles’ number nop; the maximum iterations $I$ ; the fitness function BIC; the best particle gBest $G_{2}$ ;Phase 1. Initialize Particle Swarmseach particle Get the set of undirected edges $U$ from $G_{4}$ ; Oriente the set of undirected edges $U$ to the set of directed edges $D_{2}$ by upper triangle coding; Transfer the set of directed edges into the adjacenet matrix $Z_{1}$ ; The adjacent matrix transfered from the set of directed edges $D_{1}$ completes the adjacent matrix $Z_{1}$ to obtain the adjacent matrix $Z_{2}$ ; exsit illegal cycles in the adjacent matrix $Z_{2}$ randomly reverse an edge $X\in D_{2}$ ; Phase 2. Obtain The Optimal DAG $i=1$ ; $i<I$ ; $i++$ each particle The particle velocity updating formula using Eq. (4); The particle position updating formula using Eq. (8); The transfer function using Eq. (9); exist illegal structures Use ’repair’ strategy to process the illegal structures; Evaluate the quality of particles using BIC; Determine the personal best solution $\textit{pbest}_{i}$ ; Determine the global best solution gbest; Return gBest $G_{2}$ ;

3.2 The second-stage search using ADR algorithm

Compared with the actual network structure, the structure after the first-stage search has more missing edges, fewer redundant and inverted edges. The reason is that we control the false negative rate in the first-stage search. This means that we often obtain the local structure (local space accouts for about 70% of the whole space demonstrated in Section 4.2.2) from the first-stage search and the second-stage search should focus on how to extend the local space to the whole space and then take the errors of the first-stage search into account. So we need a more targeted approach for the second-stage search, an operator-based approach.

The scored-based algorithms are essentially an optimization process based on three basic operators including add operator, delete operator and reverse operator [23]. There are some literatures on operator-based approach, such as Greedy Equivalent Search (GES) [5], the greedy hill-climbing search [26]. The key is how to combine the operators, such as the number of operators, the combination ways of operators. We prefer to use the add operator and then use other two basic operators. Aimed at this case, the proposed search algorithm for the second-stage search based on three operators used in order of add operator, delete operator and reverse operator. We denote the proposed search algorithm as ADR (the abbreviation of Add, Delete, Reverse) algorithm.

There are three basic operators in Bayesian network structure learning which are add operator, delete operator and reverse operator. What we have to strengthen is that although the reverse operator can be equivalent to the combination of one delete operator and one add operator, the reverse operator is an indispensable operator in BN structure learning. And an example of using these three basic operators is shown in Fig. 10.

In ADR algorithm, three operators are executed sequentially in each cycle, and each operator is selected which is legal and makes the score of the BN structure increases highest until the score does not increase. The three operators are carried out in the order of add operator, reverse operator and delete operator. The reason for add operator before reverse operator is that the direction of edge added can be reversed by the subsequent reverse operator if the direction of the edge added is wrong. The reason for reverse operator before delete operator is that the edge reversed or added can be deleted by the subsequent delete operator if the existence of the edge is wrongly judged.

Algorithm 2 reveals the pseudo-code of using ADR algorithm for the second-stage search.

ADR algorithm for the second-stage search[1] the BN structure after the first-stage search $G_{2}$ ; the best BN structure $G_{3}$ ; the BIC score of BN structure can increase Part 1. Add One Edgeeach node pair without the edge $\left(x,y\right)$ Attempt to add the legal edge $x\to y$ ; Then attempt to add the legal edge $x\leftarrow y$ ; Choose the edge which makes the BIC score increase highest to add; Part 2. Reverse One Edgeeach node pair with the edge $\left\langle x,y\right\rangle$ Attempt to reverse the legal edge $x\to y$ ; Choose the edge which makes the BIC score increase highest to reverse; Part 3. Delete One Edgeeach node pair with edge $\left\langle x,y\right\rangle$ Attempt to delete the legal edge $x\to y$ ; Choose the edge which makes the BIC score increase highest to delete; Return the best BN structure $G_{3}$ ;

4. Experimental results and discussion

4.1 Experimental design

The proposed algorithm is implemented in following simulation environments including hardware and software environments. The PC used in the hardware environment is with an operating system of Windows 10, a CPU of Intel 3.40 GHz and a computer memory of 8 G. The software environment is in MATLAB2016b with a BN toolbox FullBNT-1.0.7, the graph toolbox matgraph-2.0 and the maximum information coefficient toolbox minepy-1.2.1.

In this paper, we use two universally representative benchmarks of BNs including Alarm network and Asia network to verify the performance of our proposed algorithm. The Alarm network has three versions [17] and the version we use is described in https://www.norsys.com/netlib/ALARM.dnet. The Asia network has four versions [20] and version we use is described in https://norsys.com/netlib/Asia.dnet. More detailed information about structural characteristics of two networks: number of variables, number of edges, connectivity (max in-degree, max out-degree) and states per variable (mean, min, max), are shown in Table 1. Considering that the variety of situations is taken into account as many as possible and the sample size is relative to the complexity of the network, different kinds of datasets for different networks are selected. For Asia network, we generate four datasets including 500, 1000, 2000 and 5000 instances. For Alarm network, we generate four datasets including 1000, 2000, 5000 and 10000 instances. For each network, the experiments on each dataset run 10 times.

Table 1
Bayesian networks used in simulation experiments

Network	Variables	Edges	Max in/out degree	States per variable
Asia	8	8	2/2	2(2-2)
Alarm	37	46	4/5	2.8(2-4)

The simulation experiments can be divided into two parts. In Section 4.2, we verify the effectiveness of the method in determining the local search space based on MIC with penalty factors by comparing with other two methods and carry on the detailed analysis of the performance for this proposed method. To achieve this goal, we measure the error severity, the searching ability, respectively from the perspective of the precision and recall, and the combination of both. The details are shown in Table 2.

Table 2

Definition of evaluation criteria for the local structure

Type	Evaluation criterion	The description of evaluation criterion
Error severity	Average error rate of links	$\frac{\textit{total number of wrong links for N times}}{(\textit{total number % of wrong and correct links})\times N}$
	Average error rate of edges	$\frac{\textit{total number of wrong edges for N times}}{(\textit{total number % of wrong and correct edges})\times N}$
Searching ability	Average searching rate of links	$\frac{\textit{total number of correct links for N times}}{(\textit{the number % of links of original network})\times N}$
	Average searching rate of edges	$\frac{\textit{total number of correct edges for N times}}{(\textit{the number % of edges of original network})\times N}$
Comprehensive ability	Average SC index	Average number of searching – Average number of errors

Figure 10.

A schematic diagram of three operators.

Explain the Table 2 as follows.

•

The number of correct links: The number of same links in the learned structure and the original structure from the network skeleton.

•

The number of wrong links: The number of redundant links in the learned structure compared with the original structure from the network skeleton.

•

The number of correct edges: The number of same edges in the learned structure and the original structure.

•

The number of wrong edges: The number of inverted edges in the learned structure compared with the original structure.

•

Average number of errors: The mean value of the sum of the number of wrong links and wrong edges in N simulation results.

•

Average number of searching: The mean value of the sum of the number of correct links and correct edges in N simulation results.

In Section 4.3, we evaluate the performance of pMIC_BPSO_ADR algorithm from the goodness of fit to data, the quality of the network structure itself and time complexity. Measure the algorithm by following three indicators described below:

Structural differences

Missing edges (ME): The number of directed edges which don’t exist in the learned structure but exist in the original structure.

Redundant edges (RE): The number of directed edges which only exist in the learned structure but don’t exist in the original structure.

Inverted edges (IE): The number of edges which are determined correctly but with wrong direction.

Correct edges (CE): The number of directed edges which are correctly determined.

Score value

The value of the Bayesian Information Criterion (BIC) scoring function. The BIC scoring function is usually used to measure the quality of BN structure which has the compromise between complexity of the model and the goodness of the model with given data.

Running time

Running time is used to evaluate the complexity of methods. As previously described, all methods are executed in the above simulation environments.

In this paper, the mean result and the best result are also used and explained as follows:

Mean Result (MR): It is the result averaged over N runs of the algorithm on N different and independent datasets for each network structure. Best Results (BR): It is the best result of the data.

4.2 Simulation analysis of the method of determining the local space based on MIC with penalty factors

4.2.1 Comparative experimental analysis

In order to verify the effectiveness of the method on eliminating triangle loops by introducing penalty factors $p_{1}$ , $p_{2}$ and CI tests, we compare our proposed method with the traditional method on eliminating triangle loops [38] by CI tests and the method on eliminating triangle loops by introducing penalty factor $p_{1}$ [39] and CI tests. We measure the methods using the evaluation criterion of average SC index and the results for different BNs are shown in Figs 11 and Fig. 12.

The abscissa is datasets in figures, and the ordinate is SC indexes which represent differences between the two methods on eliminating the triangle loops by introducing the penalty factors with the traditional method on eliminating the triangle loops. The bigger the SC index is, the more effective the structure learning is. According to Figs 11 and 12, it is easy to reveal that methods on eliminating the triangle loops by introducing the penalty factors are better than the traditional method on eliminating the triangle loops and our proposed method introducing penalty factors $p_{1}$ , $p_{2}$ is more excellent than the method introducing the penalty factor $p_{1}$ .

4.2.2 Algorithm performance analysis

Tables 3 and 4 display the results of experiments in Alarm and Asia network.

Table 3
Evaluation criteria of the inital BN structure for Alarm network

Data	Average error rate of links	Average error rate of edges	Average searching rate of links	Average searching rate of edges
1000	0.032583	0.081890	0.773913	0.378261
2000	0.027102	0.057663	0.780435	0.395652
5000	0.027027	0.005263	0.782609	0.426087
10000	0.027027	0.033173	0.782609	0.393478

Table 4

Evaluation criteria of the inital BN structure for Asia network

Data	Average error rate of links	Average error rate of edges	Average searching rate of links	Average searching rate of edges
500	0.182143	0.120	0.7250	0.4125
1000	0.110714	0.175	0.7875	0.350
2000	0.139286	0.115	0.7625	0.325
5000	0.085714	0.175	0.8	0.3625

Figure 11.

Comparison of the average SC index under different methods for Asia network.

Figure 12.

Comparison of the average SC index under different methods for Alarm network.

According to Table 3 and 4, we can easily find that the average error rate of links, average error rate of edges for Alarm network and Asia network are very low, especially for Alarm network with more nodes. These two evaluation criteria indicate that the links and edges found by our proposed method are basically correct, and the redundant edges and reverse edges are very low.

Based on Tables 3 and 4, it is shown that the average searching rate of links and average searching rate of edges are relatively high. The lowest average searching rate of links shown is above 72% and the lowest average searching rate of edges shown is above 32%. We can see that the determined search space of local search accounts for more than 70% of the whole search space.

The low error rate and good searching ability play a key role for the strategy of two-stage heuristic searches.

4.3 Simulation analysis of the BN structure learning algorithm based on the strategy of two-stage searches

In order to verify the effectiveness of the proposed method pMIC_BPSO_ADR, we compare the our proposed method with pMIC_BPSO and TPBM[3]. pMIC_BPSO represents a hybrid algorithm using the traditional strategy. TPBM represents an improved structure learning algorithm in recent years.

4.3.1 Comparison of structural differences

In this experiment, we compare three methods by calculating the structural differences. The smaller ME, RE and IE in structure differences are, the better the BN structure learned by the algorithm is. The bigger CE in structure differences is, the better the BN structure learning is. The experimental results for different networks are indicated in Figs 13 and 14.

Figure 13.

Comparison of structural differences under different methods for Asia network.

In these figures, the horizontal coordinates depict different datasets and the vertical coordinates indicate kinds of structural differences. According to Figs 13 and 14, we can easily find that three methods’ ME, IE, RE values and the learned structure by our proposed method is better than other two methods by the comprehensive criterion CE.

4.3.2 Comparison of BIC scores

In this experiment, we compare three methods by calculating the BIC scores. The greater the BIC score is, the better the BN structure learning is. The experimental results for different networks are indicated in Tables 5 and 6.

Table 5
Comparison of BIC scores under different methods for Asia network

Algorithm	Sample size	MR	BR
pMIC_BPSO_ADR	500	$-$ 1159.42	$-$ 1148.71
	1000	$-$ 2278.7	$-$ 2191.29
	2000	$-$ 4496.78	$-$ 4370.53
	5000	$-$ 11269.5	$-$ 11124.7
pMIC_BPSO	500	$-$ 1161.82	$-$ 1148.71
	1000	$-$ 2286.14	$-$ 2203.54
	2000	$-$ 4500.81	$-$ 4391.23
	5000	$-$ 11291.2	$-$ 11147.5
TPBM	500	$-$ 1160.31	$-$ 1149.89
	1000	$-$ 2281.37	$-$ 2197.42
	2000	$-$ 4498.47	$-$ 4387.21
	5000	$-$ 11277.8	$-$ 11128.75

Figure 14.

Comparison of structural differences under different methods for Alarm network.

According to Tables 5 and 6, we can easily find that our proposed method can get greater BIC scores with other two methods. Therefore, we can indicated that the learned structure by our proposed method is better than other two methods.

Table 6

Comparison of BIC scores under different methods for Alarm network

Algorithm	Sample size	MR	BR
pMIC_BPSO_ADR	1000	$-$ 10596.2	$-$ 10424
	2000	$-$ 20287.3	$-$ 19923
	5000	$-$ 48686.1	$-$ 48300.3
	10000	$-$ 95895.6	$-$ 95535
pMIC_BPSO	1000	$-$ 11038.8	$-$ 10791.4
	2000	$-$ 21042.1	$-$ 20449
	5000	$-$ 50733.2	$-$ 49697.6
	10000	$-$ 99947.2	$-$ 97200.1
TPBM	1000	$-$ 10989.7	$-$ 10739.4
	2000	$-$ 20664.9	$-$ 19963.9
	5000	$-$ 49378.8	$-$ 48798.6
	10000	$-$ 96389.2	$-$ 95843.1

4.3.3 Comparison of running time

In this experiment, we compare three methods by measuring the running time. The shorter running time is, the lower the complexity is and the better performance of the algorithm is. The experimental results for different networks are indicated in Figs 15 and 16.

Figure 15.

Comparison of running time under different methods for Asia network.

Two figures show the average running time of three methods on two networks to reach the best structure. The horizontal coordinate depicts different datasets and the vertical coordinate depicts average running time. Based on Figs 15 and 16, we can find that the proposed method has obviously better time performance than pMIC_BPSO, and has lower running time than TPBM in most cases. From Figs 13 and 14, Tables 5 and 6, our proposed method can obtain better structural differences and BIC scores. Moreover, our proposed method can get considerable better time performance when the data set is large. This means that the proposed method has a good advantage of the ability to handle large data sets.

Figure 16.

Comparison of running time under different methods for Alarm network.

5. Conclusion

The current hybrid algorithms generally adopt the strategy of one heuristic search and can be divided into two types which both exist the weaknesses. Therefore, in this paper, we propose a new strategy of hybrid algorithms using two-stage searches which are flexible use of appropriate search methods at different stages. In the first-stage search, we propose the method based on MIC introducing penalty factors to eliminate triangular loops. The experiments show that the improved method can get a better performance. In the second-stage search, we propose the ADR algorithm considering the differences between the local space and the true whole space. In order to verify the effectiveness of the whole proposed algorithm, we do simulation experiments to demonstrate that our proposed algorithm can obtain better performance of BN structure learning.

Footnotes

Acknowledgments

The authors would like to thank the editor and the anonymous reviewers for their insightful comments and suggestions. This study was supported by the National Key R&D Program of China (2017YFB0304205), the National Natural Science Foundation of China (61973067), and the Open Research Fund from the State Key Laboratory of Rolling and Automation, Northeastern University (2019RALKFKT004).

References

Khanteymoori

A.R.

Olyaee

M.H.

Abbaszadeh

and Valian

M.J.S.C.

, A novel method for Bayesian networks structure learning based on Breeding Swarm algorithm, Soft Computing 13 (2017), 1–12.

Salman

Ahmad

and Al-Madani

S.J.M.

, Particle swarm optimization for task assignment problem, Microprocessors and Microsystems 26 (2002), 363–371.

Wang

C.X.

, Learning Bayesian Network Structure Based on Topological Potential, Journal of Information & Computational Science 12 (2015), 3383–3393.

Chickering

D.M.

Geiger

and Heckerman

A.D.

, Learning bayesian networks is np-complete, Networks 112 (1996), 121–130.

Chickering

D.M.

, Optimal Structure Identification With Greedy Search, Journal of Machine Learning Research 3 (2002), 507–554.

Reshef

D.N.

Reshef

Y.A.

Finucane

H.K.

Grossman

S.R.

Gilean

M.V.

Turnbaugh

P.J.

Lander

E.S.

Michael

and Sabeti

P.C.

, Detecting novel associations in large data sets, Science 334 (2011), 1518–1524.

Azali

Sheikhan

, Intelligent control of photovoltaic system using BPSO-GSA-optimized neural network and fuzzy-based PID for maximum power point tracking, Applied Intelligence 44 (2016), 88–110.

Borboudakis

and Tsamardinos

, Bayesian network learning with discrete case-control data, in: Uncertainty in Artificial Intelligence, 2015, pp. 151–160.

G.L.

Gao

X.G.

and Di

R.H.

, DBN structure learning based on mi-bpso algorithm, in: Ieee/acis International Conference on Computer and Information Science, 2014, pp. 245–250.

10.

Guo

Zhang

Yong

and Jiang

, Bayesian Network Structure Learning Algorithms of Optimizing Fault Sample Set, in: Proceedings of the 2015 Chinese Intelligent Systems Conference, 2016, pp. 321–329.

11.

Itoh

Funahashi

Yamamoto

Saito

Takumi

and Matsuo

, Forecasting Students’ Future Academic Records Using Bayesian Netowrk, in: International Symposium on Soft Computing and Intelligent Systems, 2015, pp. 458–462.

12.

Liu

Zhou

Lam

and Guan

, A new hybrid method for learning bayesian networks: Separation and reunion, Knowledge-Based Systems 121 (2017), 185–197.

13.

Zhang

H.B.

Liu

J.J.

and Li

, Fault detection for medical body sensor networks under bayesian network model, in: International Conference on Mobile Ad-Hoc and Sensor Networks, 2016, pp. 37–42.

14.

Tsamardinos

Brown

L.E.

and Aliferis

C.F.

, The max-min hill-climbing Bayesian network structure learning algorithm, Machine Learning 65 (2006), 31–78.

15.

Kinney

J.B.

and Atwal

G.S.

, Equitability, mutual information, and the maximal information coefficient, Proceedings of the National Academy of Sciences 111 (2014), 3354–3359.

16.

Cowie

Oteniya

Coles

R.J.L.N.i.E.

and Science

, Particle Swarm Optimisation for learning Bayesian Networks, Lecture Notes in Engineering and Computer Science 2165 (2007), 7–12.

17.

Cheng

Bell

and Liu

W.R.

, Learning Bayesian Networks from Data: An Information-Theory Based Approach, Artificial Intelligence 137 (2002), 43–90.

18.

Dai

J.G.

Jia

W.C.

Shikhin

and Ma

J.X.

, An improved evolutionary approach-based hybrid algorithm for Bayesian network structure learning in dynamic constrained search space, Neural Computing and Applications 10 (2018), 1–22.

19.

J.N.

and Zhang

Y.X.

, A Method for Learning Bayesian Network Structure, in: Sixth International Conference on Intelligent Human-Machine Systems and Cybernetics, 2014, pp. 222–225.

20.

Murphy

, The bayes net toolbox for matlab, Comput Sci Stat 33 (2001), 331–350.

21.

Huang

K.X.

Zhou

C.J.

Tian

Y.C.

and Peng

, Application of Bayesian network to data-driven cyber-security risk assessment in SCADA networks, in: 2017 27th International Telecommunication Networks and Applications Conference (ITNAC), 2017, pp. 1–6.

22.

Chuang

Tsai

and Yand

, Improved binary particle swarm optimization using catfish effect for feature selection, Expert Systems with Applications 38 (2011), 12699–12707.

23.

Campos

L.M.D.

and Castellano

J.G.

, Bayesian network learning algorithms using structural restrictions, International Journal of Approximate Reasoning 45 (2007), 233–254.

24.

Zhang

, A bayesian network based structure learning algorithm, in: International Conference on Robots and Intelligent System, 2016, pp. 12–15.

25.

El-Shorbagy

M.A.

and Hassanien

A.E.

, Particle Swarm Optimization from Theory to Applications, International Journal of Rough Sets and Data Analysis 5 (2018), 1–24.

26.

Gasse

Aussem

and Elghazel

, A hybrid algorithm for Bayesian network structure learning with application to multi-label learning, Expert Systems with Applications 41 (2014), 6755–6772.

27.

Zhu

M.M.

Liu

S.Y.

and Jiang

J.W.

, Learning bayesian networks in the space of structures by a hybrid optimization algorithm, International Journal of Intelligent Systems 11 (2016), 889–2016.

28.

Pinto

P.C.

Nagele

Dejori

Runkler

T.A.

and Sousa

J.M.C.

, Using a local discovery ant algorithm for bayesian network structure learning, IEEE Transactions on Evolutionary Computation 13 (2009), 767–779.

29.

Liu

Pérès

and Tchangani

, Object Oriented Bayesian Network for complex system risk assessment, IFAC-PapersOnLine 49 (2017), 31–36.

30.

Wang

R.P.

Chen

and Zhu

, MIC-KMeans: A Maximum Information Coefficient Based High-Dimensional Clustering Algorithm, in: Artificial Intelligence and Algorithms in Intelligent Systems, 2019, pp. 208–218.

31.

Robinson

R.W.

, Counting unlabeled acyclic digraphs, Combinatorial Mathematics V 622 (1977), 28–43.

32.

Fukuda

and Yoshihiro

, Learning Bayesian Networks Using Probability Vectors, in: Advances in Intelligent Systems and Computing 290 (2014), 503–510.

33.

Gheisari

and Meybodi

M.R.

, BNC-PSO: structure learning of Bayesian networks by Particle Swarm Optimization, Information Sciences 348 (2016), 272–289.

34.

and Kim

D.W.

, An efficient node ordering method using the conditional frequency for the K2 algorithm, Pattern Recognition Letters 40 (2014), 80–87.

35.

Mirjalili

and Lewis

A.J.S.

, S-shaped versus V-shaped transfer functions for binary Particle Swarm Optimization, Swarm & Evolutionary Computation 9 (2013), 1–14.

36.

Wang

Zhu

and Gao

Z.J.

, Fault diagnosis for power system based on a special Bayesian network, in: TENCON 2015 – 2015 IEEE Region 10 Conference, 2015, pp. 1–6.

37.

Zhang

Y.H.

Q.P.

Zhang

W.S.

and Liu

, A novel bayesian network structure learning algorithm based on maximal information coefficient, in: IEEE Fifth International Conference on Advanced Computational Intelligence, 2012, pp. 862–867.

38.

Zhang

Y.H.

Zhang

W.S.

and Xie

, Improved heuristic equivalent search algorithm based on maximal information coefficient for bayesian network structure learning, Neurocomputing 117 (2013), 186–195.

39.

Wei

Z.Q.

H.J.

and Gui

X.L.

, Bayesian network structure learning algorithm based on maximal information coefficient, Application Research of Computers 31 (2014), 3261–3265.

40.

Z.Y.

Wang

H.P.

and Deng

Z.C.

, Particle swarm optimization-based algorithm of a symplectic method for robotic dynamics and control, Applied Mathematics and Mechanics 40 (2019), 113–128.

An efficient Bayesian network structure learning algorithm using the strategy of two-stage searches

Abstract

Keywords

1. Introduction

2. Related work

2.1 Bayesian networks

3.1 The first-stage search

3.1.1 Determine the local search space based on MIC with penalty factors

4. Experimental results and discussion

4.1 Experimental design

Table 1 Bayesian networks used in simulation experiments

4.2.1 Comparative experimental analysis

4.2.2 Algorithm performance analysis

Table 3 Evaluation criteria of the inital BN structure for Alarm network

4.3.1 Comparison of structural differences

Table 5 Comparison of BIC scores under different methods for Asia network

Footnotes

Acknowledgments

References

Table 1
Bayesian networks used in simulation experiments

Table 3
Evaluation criteria of the inital BN structure for Alarm network

Table 5
Comparison of BIC scores under different methods for Asia network