Bayesian network structure learning based on HC-PSO algorithm

Abstract

Structure learning is the core of graph model Bayesian Network learning, and the current mainstream single search algorithm has problems such as poor learning effect, fuzzy initial network, and easy falling into local optimum. In this paper, we propose a heuristic learning algorithm HC-PSO combining the HC (Hill Climbing) algorithm and PSO (Particle Swarm Optimization) algorithm, which firstly uses HC algorithm to search for locally optimal network structures, takes these networks as the initial networks, then introduces mutation operator and crossover operator, and uses PSO algorithm for global search. Meanwhile, we use the DE (Differential Evolution) strategy to select the mutation operator and crossover operator. Finally, experiments are conducted in four different datasets to calculate BIC (Bayesian Information Criterion) and HD (Hamming Distance), and comparative analysis is made with other algorithms, the structure shows that the HC-PSO algorithm is superior in feasibility and accuracy.

Keywords

Keywords. Bayesian network structure learning HC algorithm PSO algorithm DE algorithm

1 Introduction

Bayesian Network (BN) is a classical probabilistic graphical model, which mainly describes the dependency relationship between random variables with versatility and effectiveness, and is widely used for the intelligent solution of uncertainty problems. BN is capable of transforming data directly into intuitive graphs for inference, and is popular in multiple fields such as risk management [1], medical diagnosis [2], and information fusion [3].

BN research is divided into two parts: structure learning and parameter learning, where structure learning is the basis and core of structural learning of BN. Generally speaking, structure learning methods are divided into three main categories: learning methods based on dependency analysis; methods based on scoring search; and hybrid methods. The first dependency-based learning method determines the BN structure by determining the existence of edges and the direction of the edges through dependency and conditional independence tests between two variables. The classical dependency-based analysis methods are Inferred Causation (IC) algorithm [4] and PC algorithm [5]. The second approach, which is based on scoring search, consists of two parts: the selection of the search strategy and the selection of the scoring function to find the structure that gets the highest score. The common scoring search algorithms are K2 algorithm [6] and Particle Swarm Optimization algorithm (PSO) [7]. The third hybrid algorithm combines the first two algorithms, reduces the search space by conditional independence test, and then uses a score-based approach for structure learning. The well-known hybrid learning methods are Max-Min Hill Climbing (MMHC) [8] and H2PC [9]. Many scholars propose some new structural learning methods, such as integer programming methods [10] and information theoretic methods [11].

In recent years, the PSO algorithm is a population-based optimization method that mainly solves various function optimization problems with fast convergence, and has been successfully applied to many research and application fields, including BN structure learning. However, the classical PSO algorithm is only applicable to continuous numerical variables, so the discrete particle swarm space has to be introduced in the structure learning process [12], and some scholars have proposed their own PSO-based structure learning methods, such as BNC-PSO [13], PC-PSO [14] and sPSO [15] etc.

In this paper, we propose the HC-PSO learning algorithm based on HC algorithm and PSO algorithm, introducing a spatial discretization method that is combined with DE (Differential Evolution) [16] to obtain the optimal solution by continuous iteration of particle swarm. In the following parts, we first review the scoring-search-based approach in structure learning for BN, then specifies the improved PSO algorithm for structure learning, and then gives the experimental setup and results, and the comparison with other common structure learning algorithms, and finally concludes this paper.

2 BN structure learning

2.1 Introduction of BN

BN is a directed acyclic graph that uses probability theory to deal with the uncertainty between variables. BN has two components, which are denoted by G =< V, E >, where V ={ υ₁, υ₂ … υ_p } denotes the set of p nodes, representing random variables, and E = < e₁, e₂ … e_n > denotes the set of directed edges from the parent node to the child node, representing the correlation between the two variables correlation. Using Pa (X_i) to denote the parent node of node X_i and according to the Markov property [17], the conditional probability theory formula is Equation (1) $P (X_{1}, X_{2}, \dots \dots X_{p}) = \prod i = 1 p P (X_{i} | Pa (X_{i}))$ (1)

where P (X_i|Pa (X_i)) denotes the conditional distribution of child node X_i given the parent node Pa (X_i), reflecting the dependency relationship between the parent and child node. In addition, Bayesian networks can output a p × p adjacency matrix A. The parameter of A_ij is 1 when a directed edge is pointing from X_i to X_j, A_ij is 0 in other cases, for example, the standard network of CANCER [18] and its adjacency matrix are shown in Fig. 1.

Fig. 1

CANCER network and its adjacency matrix, A.

2.2 Structure learning methods for scoring search

Given dataset D, finding the best BN for D is the final goal of structure learning. The method used in this paper is based on a scoring search, and this method requires two processes, which are the search process and the scoring function. The network is continuously changed during the search process, and the corresponding scoring function values are calculated to finally obtain the optimal structure.

2.2.1 Scoring function

The common scoring functions are Bayesian Dirichlet Equivalent (BDe) [19], Minimum Description Length (MDL) [20], and Bayesian Information Criterion (BIC) [21]. The BDe is a measure that maximizes the posterior distribution under the assumption of parameter distribution and relies heavily on the assumption of prior distribution; the MDL measure is based on the assumption that the number of laws in the data encoded by the model is proportional to the amount of data compression allowed by the model, and the computational effort increases exponentially when the number of nodes increases; the BIC is a measure that selects the optimal model among a limited set of models based on the likelihood function. When the sample size is large, the results of the three measures are convergent. The specific definition of the BIC function in this paper is Equation (2).

$\begin{matrix} logP (B_{G} | D) = \sum_{i = 1}^{n} \sum_{j = 1}^{q_{i}} \sum_{k = 1}^{r_{i}} m_{ijk} \\ \log \frac{m_{ijk}}{m_{ij}} - \frac{1}{2} \sum_{i = 1}^{n} q_{i} (r_{i} - 1) \end{matrix}$ (2)

where, B_G is a network structure, m_ij denotes the number of samples when the parent nodes of X_i take the jth combination, m_ijk is the number of samples when the parent nodes of X_i that is the kth value take the jth combination, r_i denotes the number of values of X_i, and q_i denotes the number of combinations of its parent nodes.

2.2.2 Search process

Finding an optimal structure in the space is the np-hard problem [22], and greedy algorithms are generally used to improve the corresponding network structure by adding edges, subtracting edges, and reversing operations until it cannot be improved, such as the optimal tree algorithm. When there are many nodes, the greedy algorithm needs to traverse all the nodes, and the computational complexity will increase exponentially, so heuristic search algorithms are used, and common algorithms are Tabu Search (Tabu) [23], Genetic Algorithm (GA) [24] and Hill Climbing (HC) [25]. This paper uses the PSO algorithm, which uses a positive feedback mechanism and the search process converges continuously toward the optimal solution of the system. A method of particle swarm position and velocity update is also proposed to improve the efficiency of spatial convergence.

3 BN structure learning based HC-PSO algorithm

In this section, the standard PSO algorithm and the mutation and crossover operators used in updating the PSO algorithm are introduced, and then the HC-PSO algorithm based on BN structure learning is proposed.

3.1 Specific introduction of the PSO algorithm

The PSO algorithm was proposed by Kennedy and Eberhart in 1995 [26], which simulates the process of random search of a flock of birds. In PSO, the potential solution of each optimization problem is a bird in the search space, called a particle. Each particle has a velocity that determines the direction and distance of their “flight”. The particles search for the global optimal solution by following the current optimal particle

The PSO algorithm initializes a population of random particles (random solutions) and then updates the particle positions and velocities through iterations, while calculating the fitness function values for each particle to find the optimal solution. In each iteration, the particles update themselves by tracking two extremes: the optimal solution found by the particle itself, which is called the local optimum; and the optimal solution currently found by the whole population, which is the global optimum, and its update formula can be expressed by Equation (3): $\begin{matrix} V_{i}^{t} = ω \times V_{i}^{t - 1} + c_{1} r_{1} (P_{i}^{t - 1} - X_{i}^{t - 1}) \\ + c_{2} r_{2} (P_{i}^{t - 1} - X_{i}^{t - 1}) X_{i}^{t} = X_{i}^{t - 1} + V_{i}^{t} \end{matrix}$ (3) where, X_i denotes the ith generation particle position, ω is the inertia weight, c₁ and c₂ denote the acceleration constant, indicating the individual learning factor and the social learning factor respectively, r₁ and r₂ are two random numbers in the range of [0, 1], P_g and P_t denote the global optimal particle position and the local optimal particle position respectively. The global optimal solution is obtained by updating the iterative particle positions, and the specific flow of the standard PSO algorithm is shown in Fig. 2.

Fig. 2

The flow path of standard PSO.

3.2 DE model

The classical PSO algorithm is used to solve continuous optimization problems, while BN structure learning is a discrete optimization problem, so a discretization process is required, and in this paper, we choose the DE algorithm [27]. After the initialization of the population, the DE algorithm is updated by the differential variation strategy and crossover recombination strategy. The difference between any two individual vectors in the population is made, weighted, and then added with the third individual, which generates a new individual vector, and then crossed with the current vector using a certain probability to generate a new individual vector. Depending on the form of the difference vector, the variational crossover strategy is different, and the method used in this paper is shown in Equation (4): $V_{i}^{t + 1} = X_{best}^{t} + F (X_{r_{1}}^{t} - X_{r_{2}}^{t}) i i = 1, 2 L N$ $\begin{matrix} X_{ij}^{t + 1} = {\begin{matrix} V_{ij}^{t} if r_{3} < P_{c} \\ X_{ij}^{t} else \end{matrix} \end{matrix}$ (4) where, $X_{best}^{t}$ is the best vector of the t-generation, $X_{r_{1}}^{t}$ and $X_{r_{2}}^{t}$ are the randomly chosen difference vectors, N denotes the number of populations, F denotes the mutation factor, and is usually taken as a number between [0, 2]. P_c denotes the crossover probability factor, and r₃ is a random number between [0, 1].

3.2.1 DE model selects mutation operator and crossover operator

N initial populations are generated based on the prior structure, and G represents the adjacency matrix of the BN structural output, they are defined as shown in Equation (5): $\begin{matrix} G_{i} = [\begin{matrix} g_{11} & g_{12} & \cdot \cdot \cdot & g_{1 n} \\ g_{21} & g_{22} & \dots & g_{2 n} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ g_{n 1} & g_{n 2} & \dots & g_{nn} \end{matrix}] \\ g_{ij} = {\begin{matrix} 1 \\ 0 \end{matrix} \begin{matrix} if X_{i} \to X_{j}, i \neq j \\ else \end{matrix} \end{matrix}$ (5) where G_i and G_j are randomly selected from the initial populations and a transfer matrix Mov is introduced as Equation (6), and m_ij represents the three operations in the transfer process: $\begin{matrix} Mov = G_{i} - G_{j} = [\begin{matrix} m_{11} & m_{12} & \cdot \cdot \cdot & m_{1 n} \\ m_{21} & m_{22} & \dots & m_{2 n} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ m_{n 1} & m_{n 2} & \dots & m_{nn} \end{matrix}] \\ m_{ij} = {\begin{matrix} 1 \\ 0 \\ - 1 \end{matrix} \begin{matrix} addition X_{i} \to X_{j} \\ unchange \\ deletion X_{i} \to X_{j} \end{matrix} \end{matrix} .$ (6)

The target individuals are mutated according to Mov and the specific operation is shown in Equation (7): $g_{best} = {\begin{matrix} g_{best, j} + m_{ij} \\ g_{best, j} \end{matrix} \begin{matrix} if rand (0, 1) < {mutation}_{r} ate \\ else \end{matrix}$ (7)

g_best is the optimal individual in the contemporary population. This mutation process produces meaningless g_best,j = –1 and g_best,j = -2, which are set to 0 and 1 accordingly and the variational operator process pseudo-code is Algorithm 1.

Algorithm 1 Mutaion Operator

The crossover operator uses a uniform crossover strategy. H_i and V_i are individuals in the population, when the random probability value is less than crossover_rate, the corresponding nodes crossover, and the specific operation is Equation (8), and the process pseudo-code is Algorithm 2.

${\begin{matrix} H_{i} = [\begin{matrix} h_{11} & \underline{h_{12}} & \dots & h_{1 n} \\ h_{21} & h_{21} & \dots & h_{2 n} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ h_{n 1} & h_{n 2} & \dots & h_{nn} \end{matrix}] \\ U_{i} = [\begin{matrix} u_{11} & \underline{u_{12}} & \dots & u_{1 n} \\ u_{21} & u_{21} & \dots & u_{2 n} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ u_{n 1} & u_{n 2} & \dots & u_{nn} \end{matrix}] \end{matrix} \Rightarrow V_{i} = [\begin{matrix} u_{11} & \underline{h_{12}} & \dots & u_{1 n} \\ u_{21} & u_{21} & \dots & u_{2 n} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ u_{n 1} & u_{n 2} & \dots & u_{nn} \end{matrix}] if rand (0, 1) < {cossover}_{r} ate$ (8)

Algorithm 2 Crossover Operator

3.3 Improved algorithm HC-PSO based on BN structure learning

3.3.1 Priori structure

There are two methods to determine the prior structure in general: the expert knowledge-based method and the automatic learning-based method. The first one requires a lot of human resources. So in this paper, HC algorithm is chosen to determine a structural prior, which uses adjacent node constraint to expand the local search when the adjacent node scores are higher than the current node, the adjacent node will replace the current node, otherwise, it will return to the current node. The HC algorithm can obtain the current locally optimal network structure, which can reduce the initial search space and make the next operation space more efficient and converge faster.

3.3.2 Updating the BN structure

Compared with the classical particle swarm algorithm, the PSO algorithm based on BN structure learning needs to discretize the particles, and in this paper, we use a discrete speed and position update method based on genetic operations, and the particle update formula is as Equation (9):

\begin{array}{l} X_{i}^{t} = N_{3} (N_{2} (N_{1} (X_{i}^{t} - 1, ω), c_{1}), c_{2}) \\ W_{i}^{t} = N_{1} (X_{i}^{t} - 1, ω) = {\begin{cases} M (X_{i}^{t} - 1) i f r_{1} < ω \\ X_{i}^{t} - 1 e l s e \end{cases} \\ S_{i}^{t} = N_{2} (W_{i}^{t}, c_{1}) = {\begin{cases} C_{p} (W_{i}^{t}) i f r_{2} < c_{1} \\ W_{i}^{t} e l s e \end{cases} \\ X_{i}^{t} = N_{3} (S_{i}^{t}, c_{2}) = {\begin{cases} C_{g} (S_{i}^{t}) i f r_{3} < c_{2} \\ S_{i}^{t} ​ ​ ​ ​ ​ e l s e \end{cases} \end{array}

(9) where M denotes the variational operator, C_p denotes the crossover operator between the particle and the local optimal solution,

Algorithm 4 HC-PSO for BN

C_g denotes the crossover operator between the particle and the global optimal solution, N₁, N₂, N₃ denote the results of three operations respectively, ω, c₁, c₂ denote the mutation probability and two crossover probabilities, and r₁, r₂, r₃ are all random numbers of [0,1]. The HC algorithm is used to generate the first generation of particles, and the PSO algorithm is updated to find the optimal solution.∥

3.3.3 Repair of illegal structures

Bayesian network is a directed acyclic graph, but illegal structures may be generated during the mutation and crossover process, such as ring graphs. Therefore, to ensure that the output structure is correct, ring checking and repair are to be performed after the mutation and crossover operations, and the HC-PSO algorithm introduces the transfer closure T to determine whether there is a ring and repair, and the specific process pseudo-code is Algorithm 3.∥

Table 4

Algorithm 3 Check and Repair of circle

3.3.4 HC-PSO for BN structure

Using the above knowledge, The HC-PSO algorithm iteratively updates the initial local optimal structure obtained by the HC algorithm to obtain the global optimal structure. we summarize the specific flow of the HC-PSO algorithm as Algorithm 4.and the specific flow of the standard PSO algorithm is shown in Fig 3 .

Fig. 3

The flow path of the HC-PSO.

4 Experiments

4.1 Preparation for experiment

To verify the performance of the algorithms in this paper, the classical CANCER, ASIA [28], CHILD [29] and ALARM [30] networks are used as experimental datasets. Four training sets are randomly selected for each experiment, and the numbers of samples are 500, 1000, 1500, and 2000. Meanwhile, we compare the HC-PSO algorithm with other algorithms, namely HC, Tabu, MMHC and PC-PSO. All algorithms are implemented in the software RStudio, and the experiments are executed in a LAPTOP, its CPU is AMD Ryzen 5800H with Radeon Graphics 3.20 GHz and 16.00 GB memory.

4.2 Parameter setting

Before experimenting, we need to set the parameters, and the initial number of populations is set 50 and the maximum number of iterations MaxIt is set 100. In the particle update formula, the inertia weight ω indicates the effect of the particle’s previous velocity action trajectory, and the acceleration constants c₁, c₂ indicate the effects of the local optimal position and the global optimal position on the action trajectory. In the study [31], a larger c₁ and smaller c₂ strategy was proposed for the early part of the experiment, and the opposite for the later part of the experiment. This strategy accelerates the convergence rate without falling into the local optimum. Therefore, we calculate ω, c₁ and c₂ according to the following Equation (10), where ω₀ = 0.9, ω₁ = 0.35, c₁₀= 0.84, c₁₁= 0.5, c₂₀= 0.38, c₂₁= 0.81, and i denotes the number of current iterations: $\begin{matrix} ω = ω_{0} - \frac{ω_{0} - ω_{1}}{MaxIt} \times i \\ c_{1} = c_{10} - \frac{ω_{10} - ω_{11}}{MaxIt} \times i \\ c_{2} = c_{20} - \frac{ω_{20} - ω_{21}}{MaxIt} \times i \end{matrix}$ (10)

4.3 Evaluation metrics of BN reliability

The HD (Hamming Distance) [32], BIC (Bayesian Information Criterion) and It (the number of iterations) are used to evaluate the difference between the network structure learned by various algorithms and the standard structure.

HD is used to measure the degree of difference between the network structure learned by the algorithm and the standard network structure. It is expressed as the sum of the number of RE (Redundant Edges), the number of ME (Missing Edges) and the number of IE (Inverted Edges), as show in Equation (11). And the smaller HD is, the closer the learned network is to the standard network. $HD = RE + ME + IE$ (11)

The BIC is based on the likelihood function, and the larger the value, the better the fit of the learned network. The function expression of BIC is shown in Equation (1).

The number of iterations, where the best structure was found (It).

4.4 Experimental analysis

4.4.1 Performance analysis of the HC-PSO

Four datasets are selected for the experiments, namely CANCER (5 nodes, 4 arcs), ASIA (8 nodes, 8 arcs), CHILD (20 nodes, 25 arcs), and ALARM (37 nodes, 46 arcs), with the numbers of samples being 500, 1000, 1500 and 2000 respectively. The PSO algorithm has randomness in the update operation, so we take the average of 100 experiments as the experimental results, as shown in Table 1, the data in parentheses indicate the standard network BIC scores. The BN (2000 samples) learned in all datasets using the HC-PSO algorithm is shown in Figs. 4–7.

Fig. 4

The BN structure of CANCER network.

Fig. 5

The BN structure of ASIA network.

Fig. 6

The BN structure of CHILD network.

Fig. 7

The BN structure of ALARM network.

Table 1

Experiment results of HC-PSO with different sample sizes

Network	The sizes of sample
	500		1000		1500		2000
	HD	BIC	HD	BIC	HD	BIC	HD	BIC
CANCER	1.40	–1073.09 (–1076.57)	1.22	–2121.27 (–2129.74)	1.19	–3173.35 (–3181.72)	0.66	–4225.04 (–4262.47)
ASIA	1.02	–1166.67 (–1202.09)	0.91	–2292.21 (–2302.66)	0.83	–3403.25 (–3423.89)	0.76	–4526.98 (–4541.17)
CHILD	6.56	–6669.28 (–6748.89)	5.16	–12894.18 (–12948.02)	2.63	–19056.76 (–19077.32)	1.60	–25210.68 (–25319.08)
ALARM	17.43	–6302.15 (–6837.92)	13.62	–11831.27 (–12274.01)	11.91	–17206.09 (–17404.50)	10.49	–22560.22 –(22856.33)

From the experimental results in Table 1, we can see that the BIC learned by HC-PSO algorithm is similar to the standard score, and the HD is decreasing with the increase of sample sizes, which indicates that the accuracy of BN is getting better. It can also be seen from Table 1 that the accuracy of BN learned by HC-PSO is affected as the number of nodes and edges increases.

4.2 Comparison with other algorithms

The algorithm in this paper is compared experimentally with HC, Tabu, MMHC and PC-PSO, the results of the experiments are shown in Tables 2–6. The HD (2000 samples) comparison results are shown in Fig. 8 and the iterations (500 samples) results.

Observing Tables 2–5, under the same sample size conditions, the results indicate that as the sample size increases, the HD of all algorithms decreases and the absolute value of BIC increases. The HD obtained by the HC-PSO algorithm is the smallest among these four algorithms, indicating that the accuracy of the HC-PSO algorithm is better. At the same time, with the addition of Fig. 8, it can be seen that the results of the HC algorithm and the HC-PSO algorithm are similar, indicating that the HC algorithm can learn a good initial structure, while the PSO algorithm plays a general role in iterative updates, but the final results are better than the MMHC algorithm and the PC-PSO algorithm. Table 6 and Fig. 9 show the fractional convergence of the algorithm on different networks, indicating that the HC-PSO algorithm converges and has a relatively good convergence speed. All in all, the HC-PSO proposed in this article outperforms other related algorithms in terms of performance indicators.

Fig. 8

The HD of different algorithms.

Fig. 9

The Iterations.

Table 2

The learning results of different algorithms on CANCER network

Network	The sizes of CANCER sample
	500		1000		1500		2000
	HD	BIC	HD	BIC	HD	BIC	HD	BIC
HC-PSO	1.40	–1073.09	1.22	–2121.27	1.19	–3173.35	0.96	–4225.04
HC	2.32	–1062.85	1.61	–2130.04	1.21	–3182.33	0.91	–4227.75
Tabu	2.38	–1074.17	1.70	–2127.14	1.21	–3175.48	0.96	–4237.86
MMHC	2.37	–1068.67	1.71	–2127.45	1.43	–3175.04	1.29	–4228.84
PC-PSO	2.99	–1063.29	2.10	–2046.53	1.85	–3088.28	1.57	–4127.01

Table 3

The learning results of different algorithms on ASIA network

Network	The sizes of ASIA sample
	500		1000		1500		2000
	HD	BIC	HD	BIC	HD	BIC	HD	BIC
HC-PSO	1.02	–1166.67	0.91	–2292.21	0.83	–3403.25	0.76	–4526.98
HC	1.89	–1159.72	1.34	–2294.04	1.10	–3407.98	1.09	–4525.46
Tabu	1.90	–1161.89	1.23	–2296.27	1.10	–3410.39	1.05	–4531.31
MMHC	3.72	–1246.57	3.33	–2460.18	3.19	–3668.19	3.03	–4885.63
PC-PSO	4.02	–1139.70	3.79	–2328.47	3.23	–3549.19	3.00	–4748.22

Table 4

The learning results of different algorithms on CHILD network

Network	The sizes of CHILD sample
	500		1000		1500		2000
	HD	BIC	HD	BIC	HD	BIC	HD	BIC
HC-PSO	6.56	–6669.28	5.16	–12894.18	2.63	–19056.70	1.60	–25210.68
HC	7.30	–6661.48	6.03	–12912.11	4.69	–19114.32	3.35	–25299.98
Tabu	6.20	–6632.59	5.25	–12857.73	2.81	–19025.57	1.67	–25152.42
MMHC	9.80	–6906.19	8.07	–13207.34	7.46	–19530.68	7.03	–25889.90
PC-PSO	11.70	–7259.05	9.02	–13812.	8.03	–20095.23	7.52	–26816.05

Table 5

The learning results of different algorithms on ALARM network

Network	The sizes of ALARM sample
	500		1000		1500		2000
	HD	BIC	HD	BIC	HD	BIC	HD	BIC
HC-PSO	17.43	–6302.15	13.62	–11831.27	11.91	–17206.09	10.49	–22560.22
HC	19.19	–6290.90	14.06	–11814.75	12.67	–19114.32	11.59	–22580.53
Tabu	18.58	–6284.79	13.86	–11824.98	11.91	–19025.57	10.20	–22569.50
MMHC	26.16	–7647.99	20.14	–13969.20	18.03	–19530.68	16.75	–27484.74
PC-PSO	29.09	–7144.49	21.01	–13714.53	18.38	–20095.23	15.61	–26981.28

5 Conclusion

In this paper, the HC-PSO algorithm is proposed for the BN learned by the HC algorithm that can easily fall into the local optimal structure. The HC algorithm and the PSO algorithm are relatively simple in concept, both retain the memory of the search space, and can be implemented by a short code with relatively low computational cost. In addition, The DE algorithm strategy is used in the selection of mutation and crossover operators, which is a population-based adaptive global optimization algorithm with fast convergence and high robustness compared to GA [33]. Finally, in the experiments, the HC-PSO algorithm learns a little better than the other algorithms.

BN is an important method for solving uncertainty problems, and the HC-PSO algorithm provides a new research idea for BN structure learning. However, considering that the HC-PSO algorithm doesn’t have obvious advantages in the learning effect of multi-node data sets, further research should focus on integrating node priority into the algorithm. Meanwhile, further methods to evaluate the HC-PSO algorithm should be applied to real-world uncertainty problems. For example, learning the BN of relevant variables by the HC-PSO algorithm from medical dataset can make more accurate medical diagnosis.

Table 6
The number of iterations on all of data sets (500 samples)

Data Sets

CANCER ASIA CHILD ALARM

HC-PSO 3 12 22 31

HC 4 16 30 61

Tabu 6 10 37 79

MMHC 5 9 43 68

PC-PSO 4 14 21 50

	Data Sets
HC-PSO	3	12	22	31
HC	4	16	30	61
Tabu	6	10	37	79
MMHC	5	9	43	68
PC-PSO	4	14	21	50

6 Sources of financial support

Financial support came from National Natural Science Foundation of China (72374094) and Scientific Research Project of the General Administration of Customs(2022HK069).

7 Authors contribution statement

Wenlong Gao: Ideas; supervision; Funding acquisition; Review; Revision and Editing.

Mingqian Zhi: Creation of models; Data analysis and Writing.

Anping Liu: Creation of models; Data Curation; Parameter test.

Yongsong Ke: Data Curation; Parameter test.

Xiaolong Wang: Data Curation.

Yun Zhuo: Data Curation.

Yi Yang: Data Curation.

8 Ethics declarations

8.1 Ethical and informed consent for data used

This article does not contain any studies with human participants or animals performed by any of the authors. All data used are available openly.

9 Conflicts of interest

All the authors declared no conflict of interest.

10 Data availability and access

The data that support the findings of this study are openly available in Bayesian Network Repository at https://www.bnlearn.com/. The data contents are licensed under the Creative Commons Attribution-Share Alike License. And this study doesn’t violate any ethical standards.

References

Barbaros Yet , Anthony Constantinou et al. A Bayesian network framework for project cost, benefit and risk analysis with an agricultural development case study, Expert Systems with Applications (2016), 141–155. https://doi.org/10.1016/j.eswa.2016.05.005

Nistal-Nuno

, Tutorial of the probabilistic methods Bayesian networks and influence diagrams applied to medicine, Journal of Evidence-Based Medicine 11(2) (2018), 112–124. https://doi.org/10.1111/jebm.12298

Costa

P.C.

, Yu

, Atiahetchi

, Myers

High-Level Information Fusion of Cyber-Security Expert Knowledge and Experimental Data, International Conference on Information Fusion (2018), 2322–2329.

Pearl

and Verma

T.S.

, A theory of inferred causation, Logic, Methodology and Philosophy of Science IX 134 (1995), 789–811. https://doi.org/10.1016/S0049-237X(06)80074-1

Michail Tsagris Bayesian Network Learning with the PC Algorithm: An Improved and Correct Variation, Applied Artificial Intelligence (2019), 101–123. https://doi.org/10.1080/08839514.2018.1526760

Behjati , Shahab , Beigy , Hamid Improved K2 algorithm for Bayesian network structure learning, Engineering Applications of Artificial Intelligence (2020), 55–64. https://doi.org/10.1016/j.engappai.2020.103617

Cowie

, Oteniya

, Coles

Particle swarm optimisation for learning bayesian networks, Lecture Notes in Engineering and Computer Science (2007), 71–76. http://www.iaeng.org/publication/WCE2007/

Tsamardinos

, Brown

L.E.

and Aliferis

C.F.

, The max-min hill-climbing Bayesian network structure learning algorithm, Machine Learning 65(1) (2006), 31–78. https://doi.org/10.1007/s10994-006-6889-7

Gasse

, Aussem

and Elghazel

, A hybrid algorithm for Bayesian network structure learning with application to multi-label learning, Pergamon (15) (2014), 6755–6772. https://doi.org/10.1016/j.eswa.2014.04.032

10.

Bartlett

, Cussens

Integer Linear Programming for the Bayesian network structure learning problem, Artificial Intelligence (2017), 258–271. https://doi.org/10.1016/j.artint.2015.03.003

11.

Dai

, Ren

, Du

, Shikhin

and Ma

, An improved evolutionary approach-based hybrid algorithm for Bayesian network structure learning in dynamic constrained search space, Neural Computing and Applications 32 (2020), 1413–1434. https://doi.org/10.1007/s00521-018-3650-7

12.

X.L.

, Wang

S.C.

, He

X.D.

Learning Bayesian networks structures based on memory binary particle swarm optimization, Simulated Evolution and Learning, Springer, Berlin Heidelberg (2006), 568–574. https://doi.org/10.1007/11903697 72

13.

Gheisari

and Meybodi

M.R.

, BNC-PSO: structure learning of Bayesian networks by Particle Swarm Optimization, Information Sciences 348 (2016), 272–289. https://doi.org/10.1016/j.ins.2016.01.090

14.

Sun

, Zhou

, Wang

et al., A new PC-PSO algorithm for Bayesian network structure learning with structure priors, Expert Systems with Applications 184 (2021), 115237. https://doi.org/10.1016/j.eswa.2021.115237.

15.

Chen

, Shen

Structure Learning of Bayesian Network Using a Chaos-Based PSO, Advanced Materials Research (2012), 472–475, 2292–2295.

16.

Chen

, Xie

and Zou

, A binary differential evolution algorithm learning from explored solutions, Neurocomputing 149 (2015), 1038–1047. https://doi.org/10.1016/j.neucom.2014.07.030

17.

Flesch

, Lucas

P.J.

Markov Equivalence in Bayesian Networks, In: P. Lucas, GáJ.A. mez, Salmerón, A. (eds) Advances in Probabilistic Graphical Models, Studies in Fuzziness and Soft Computing, vol. 213. Springer, Berlin, Heidelberg, (2007). https://doi.org/10.1007/978-3-540-68996-6_1

18.

Korb

, Nicholson

A.E.

Bayesian Artificial Intelligence, USA: CRC Press Inc.S. Lauritzen, & Speigelhalter, D. (1988).

19.

Andres Cano , Manuel Gómez-Olmedo , Andres Masegosa

, Serafín Moral Locally averaged Bayesian Dirichlet metrics for learning the structure and the parameters of Bayesian networks, International Journal of Approximate Reasoning (2012), 526–540. https://doi.org/10.1016/j.ijar.2012.09.003

20.

Gogoshin

, Rodin

Minimum Uncertainty as Bayesian Network Model Selection Principle, Preprints (2022), 2022020254. https://doi.org/10.20944/preprints202202.0254.v2

21.

, Miao

, Liang

et al. BIC-based node order learning for improving Bayesian network structure learning, Front Comput Sci 15 (2021), 156337. https://doi.org/10.1007/s11704-020-0268-6

22.

Chickering

D.M.

, Heckerman

and Meek

, Large-sample learning of Bayesian networks is NP-hard, J Mach Learn Res 5 (2004), 1287–1330. https://doi.org/10.48550/arXiv.1212.2468

23.

Gendreau

, Potvin

J.Y.

Tabu Search, In: E.K. Burke, Kendall, G. (eds) Search Methodologies, Springer, Boston, MA, (2005). https://doi.org/10.1007/0-387-28356-0_6

24.

Mirjalili

Genetic algorithm: Evolutionary algorithms and neural networks, Springer, Cham (2019), 43–55. https://doi.org/10.1007/978-3-319-93025-1_4

25.

Gámez

J.A.

, Mateo

J.L.

and Puerta

J.M.

, Learning Bayesiannetworks by hill climbing: efficient methods based on progressiverestriction of the neighborhood, Data Min Knowl Disc 22 (2011), 106–148. https://doi.org/10.1007/s10618-010-0178-6

26.

James Kennedy , Russell Eberhart , Particle swarm optimization, Proceedings of ICNN’95 –International Conference on Neural Networks 4 (1995), 1942–1948.

27.

Qin

A.K.

, Huang

V.L.

, Suganthan

P.N.

Differential Evolution Algorithm With Strategy Adaptation for Global Numerical Optimization, IEEE Transactions on Evolutionary Computation (2019),398–417.

28.

Lauritzen

and Spiegelhalter

, Local Computation with Probabilities on Graphical Structures and their Application to Expert Systems (with discussion), Journal of the Royal Statistical Society: Series B 50(2) (1988), 157–224.

29.

Spiegelhalter

D.J.

and Cowell

R.G.

, Learning in probabilistic expert systems, Bayesian Statistics 4 (1992), 447–466.

30.

Beinlich

I.A.

, Suermondt

H.J.

, Chavez

R.M.

, Cooper

G.F.

The ALARM Monitoring System: A Case Study with Two Probabilistic Inference Techniques for Belief Networks, In Proceedings of the 2nd European Conference on Artificial Intelligence in Medicine, Springer-Verlag (1989), 247–256.

31.

Ratnaweera

A.G.

, HalGamuge

S.K.

and Watson

H.C.

, Self-organizing hierarchical particle swarm optimizer with time-varying acceleration coefficients, {IEEE Trans Evol Comput 8(3) (2004), 240–255.

32.

Jongh De

, Druzdzel

M.J.

A comparison of structural distance measures for causal Bayesian network models, Recent Advances in Intelligent Information Systems (2009), 443–456.

33.

Handa

, Katai

Estimation of Bayesian network algorithm with GA searching for better network structure, International Conference on Neural Networks and Signal Processing (2003), 436–439.