An evaluation of the role of fuzzy cognitive maps and Bayesian belief networks in the development of causal knowledge systems

Abstract

Fuzzy cognitive maps (FCM) and Bayesian belief networks (BBN) are two of the most frequently used causal knowledge frameworks for modelling, representing and reasoning about causal knowledge. In this paper, an evaluation of their different roles in the engineering process of developing causal knowledge systems is conducted, based on their inherent features. The evaluation criteria adopted in this research are understandability, usability, modularity, scalability, expressiveness, inferential capability, rigour, formality and preciseness. All of these are commonly used to evaluate the strengths and weaknesses of traditional knowledge representation frameworks. These criteria are used to reveal the fundamental characteristics of FCM and BBN. The findings of this study show that FCM is more appropriate for use in modelling causal knowledge, whereas BBN is more superior in model representation and inference. This study deepens the understanding of the role of FCM and BBN in the development of causal knowledge systems.

Keywords

Fuzzy cognitive maps Bayesian belief networks knowledge engineering causal knowledge systems evaluation

1 Introduction

The question of whether to choose the FCM or BBN framework for the development of a causal knowledge system is an important one. Both are well-established causal frameworks which have been used within various domains over the past three decades. The proponents of each framework claim that theirs is better and more appropriate for use in the development of a causal knowledge system; however, there is a lack of systematic research into the differences between these two frameworks. The purpose of this paper is to evaluate and compare the specific characteristics and performances of both frameworks through a set of criteria encompassing factors which are thought to be important. This evaluation is conducted in a qualitative way in order to determine whether each framework has a certain inherent feature. Worked mathematical examples and realistic cases are provided in the discussion in order to clarify the merits of FCM and BBN in various aspects of the development of causal knowledge systems.

Modelling, representation and inference are the three main stages in developing a causal knowledge system. An evaluation is carried out here to determine the different roles of FCM and BBN in these stages based on a set of criteria that are commonly used in knowledge engineering for benchmarking purposes. The set of criteria used in this study includes understandability and usability in modelling, modularity and scalability in model construction, expressiveness in representation, inferential capability, and rigour, formality and preciseness. The differences between FCM and BBN can be easily revealed through the evaluation of their important characteristics using the same set of comparison criteria.

Understandability, usability, modularity and scalability are related to the modelling of the causal knowledge systems based on the user’s perspective. Modelling a causal knowledge system involves the elicitation of causal knowledge from a domain expert, the transfer of this knowledge into a particular modelling framework, and the construction of a causal model. A good framework must be simple and user-friendly to facilitate the modelling process. Expressiveness is another important criterion which concerns flexibility in knowledge representation. It is determined by the flexibility of a causal framework in permitting the knowledge to be specified or left unspecified. Inferential capability determines the reasoning power of a causal knowledge system; a good framework must offer a diversity of reasoning mechanisms and the possibility for automated reasoning. Rigour, formality, and preciseness are related to the standardization and consistency of the causal knowledge expression; they determine the accuracy of the inference results and the correctness of the causal knowledge obtained from the reasoning process. They rely heavily on a strong mathematical foundation. This study shows the differences between FCM and BBN in the process of developing causal knowledge systems.

This paper is organised as follows. Section 2 presents some background and related work on BBN and FCM. An evaluation of FCM and BBN is carried out in Section 3. Section 3.1 evaluates both frameworks in terms of understandability and usability in causal modelling. Section 3.2 discusses the modularity and scalability of FCM and BBN in model construction. Section 3.3 compares the frameworks in terms of the expressiveness in causal knowledge representation, and Section 3.4 examines the inferential capability. Rigour, formality and preciseness are discussed in Section 3.5. A summary of the evaluation results is presented in Section 3.6. The conclusion and future works are given in Section 4.

2 Related works

Cognitive map (CM) was proposed by a political scientist, Robert Axelrod, in 1976, and was originally used to represent social scientific knowledge [1]. It is also recognised as causal map due to its usage in representing the cause and effect relationships between variables. CM was enhanced by Kosko [2, 3] through the inclusion of fuzzy values, and renamed as fuzzy cognitive map. A fuzzy value is added to an edge as the weight/degree of the causal relationship in order to represent the causal strength. The weight is usually in the interval of – 1 to 1, where – 1 indicates an entirely negative influence, +1 an entirely positive influence, and 0 no causal influence between the two variables. FCM is a well-established framework for modelling complex dynamic systems due to its ability to represent and reason about temporal causal relationships by allowing feedback loops. FCM is widely adopted in various domains such as decision support in medicine [4], yield prediction in agriculture [5], human intention recognition in robotics [6] and automatic service quality assessment in hotel management [7]. Unfortunately, a literature survey shows that many of these applications remain at the stage of research-based prototypes, and have not been deployed in real-world industrial environments. Moreover, most of the FCM development tools available today are non-commercial products, and lack a sophisticated and user-friendly integrated development environment to support the construction process.

The Bayesian belief network (BBN), also known as a belief network or Bayesian network, is a well-established framework used for causal knowledge representation and reasoning under conditions of uncertainty. The probabilistic relationships of the domain variables are presented in an acyclic graphical structure and quantified using a set of conditional probability tables (CPT). Causal reasoning is carried out based on probability theory. Updating the belief in an event and propagating the evidence over the causal network derives the implicit information that is useful for decision making process. Pearl, first in 1986 [8] and later in 1988 [9], introduced the concept of conditional independence for a more tractable and efficient evidence propagation mechanism, which has made BBN a practical tool for reasoning under uncertainty. Moreover, BBN offers sound reasoning due to its profound underlying mathematical theory. Hence, it is extensively adopted in solving real-world industry-scale problems, such as a general product recommendation engine [10], complex decision support for sustainable coastal aquaculture development in Thailand [11], assessment of the relative efficacy and safety of treatments for chronic hepatitis C in Japan [12], a joint ASTRON and IBM DOME project on auto-tuning a compiler for high-level programming languages [13] and an EPSRC Project for intelligent medical decision support systems [14]. BBN is a powerful and mature framework because it provides a comprehensive set of functionalities, and is supported by many commercially available development tools such as Hugin [15], BayesiaLab [16] and Netica [17].

Despite the fact that BBN and FCM are two major causal knowledge engineering frameworks, no focused effort has been made by researchers to evaluate their respective roles in the development of causal knowledge systems. There are limited records of any attempt to compare these from a knowledge engineering perspective with the aim of leveraging their complementary nature and mutual merits. Liu [18] was the first to recognise them as the two major causal frameworks, and to provide a brief and general introduction of them in an article, although without an attempt to put them together for comparison. Recently, Douali et al. [19] applied both BBN and case-based FCM separately to a common dataset to evaluate their performance in a classification process; however, the comparison of the characteristics of both causal frameworks was not carried out in a comprehensive way, and the possibility of integrating these was not addressed. Generally, both frameworks are applied separately within a domain, and they are not associated in any way. One preliminary comparison of these two frameworks has been conducted [20]; however, the comparison criteria used are limited, and there is a lack of realistic cases in the discussion. A more comprehensive evaluation of the roles of FCM and BBN is performed in this study by incorporating more evaluation criteria and real-life cases.

3 Comparison criteria

In this section, several common criteria are used to evaluate both FCM and BBN. Each criterion is discussed separately with examples, in order to illustrate the distinction between these two frameworks.

3.1 Understandability and usability in modelling

Modelling is the first stage in the development of causal knowledge systems before causal reasoning can be performed. The process of modelling involves the transformation of a collection of casual knowledge from domain experts into a causal model. A good framework must offer a simple and straightforward modelling method that supports knowledge transfer and the model construction process [21].

In FCM, domain experts need only deal with a simple fuzzy-signed directed graph to specify the causal knowledge; the conversion of the causal model into a corresponding adjacency matrix for the subsequent inference process is straightforward and automatable. For example, in Fig. 1, an expert first determines the domain variables Burglary, Earthquake, Alarm, JohnCalls and MaryCalls. Then, the expert determines the causal links (Burglary→Alarm), (Earthquake→Alarm), (Alarm→JohnCalls), (Alarm→MaryCalls), and their respective causal strengths: +0.76, +0.41, +1, +0.88. The aim of this process is to visually construct a directed graph. The adjacency matrix associated with the graph can be constructed directly or even automatically generated from the FCM.

Fig.1

An illustrative FCM and the corresponding adjacency matrix.

Unlike in FCM, the primary component to be constructed in BBN is a set of CPTs. The graphical structure is not a complete representation of BBN, because it does not capture the strength of the causal effects. In addition to specifying the domain variables and their causal relationships, a domain expert needs to identify the variable states and to specify the causal strength of a CPT for each variable. The graphical representation of a BBN can be discarded once the CPTs have been constructed, since the graphical structure is implicitly captured in the CPTs. For instance, as illustrated in Fig. 2, a domain expert drafts the graphical model by identifying the nodes and causal links. Following this, the CPT for each node is constructed by assigning a probability value for each entry. The CPT for Burglary and Earthquake has two entries; the CPT for Alarm has eight entries; and the CPT for JohnCalls and MaryCalls has four entries. The number of parent nodes is easily observable from the CPT. For example, in Fig. 2, Burglary and Earthquake have no parent; Alarm has two; and, JohnCalls and MaryCalls have one. The causal relationships between the variables are implicitly captured in the CPTs.

Fig.2

An illustrative BBN graphical structure and its CPTs.

A causal relationship in FCM is identified by a link between the variables, and the causal strength is represented by a single value with a sign. For example, in the FCM of Fig. 1, the domain expert identifies a causal relationship between the variables Alarm and JohnCalls with a causal strength of +1 attached to the causal link Alarm→JohnCalls. In contrast, the domain expert needs to identify the relationship between each possible state of the variables in BBN. For example, in Fig. 2, the domain expert needs to identify the probabilistic relationships between the true (T) and false (F) states of Alarm and JohnCalls. Moreover, a causal strength value in BBN is specified in terms of probability and is represented implicitly as multiple values in a CPT. As in Fig. 2, the causal strength of the link Alarm→JohnCalls is represented in the CPT, as the conditional probability values for the T and F states of JohnCalls given the T and F states of Alarm.

In FCM, the sign of the causal strength attached to a causal link indicates the type of influence from the cause variable to the effect variable. A positive sign (“+”) indicates a promoting effect, which means that an increase in the value of the cause variable will lead to an increase in the value of the effect variable, and a decrease in the value of the cause variable is associated with a decrease in the value of the effect variable. A negative sign (“– ”) indicates an inhibitory effect, which means that an increase in the value of the cause variable will lead to a decrease in the value of the effect variable, and a decrease in the value of the cause variable will lead to an increase in the value of the effect variable. The type of a causal relationship in FCM is directly observable from the sign of the causal strength. The way in which the cause variable influences the effect variable is also easily understood from the causal model. However, the type of a causal relationship in BBN is implicitly represented in the CPT. Instead of a signed value, multiple probability values are used in BBN to represent a causal effect. The type of influence of the cause variable on the effect variable cannot be easily identified from the causal model; it can only be determined by a close examination of the probability distributions represented in the CPT of the effect variable.

In FCM, the number of causal values needed to compute the combination of multiple causal effects on an effect variable is simply the total number of causal effects. For instance, in Fig. 1, two causal values (+0.76 and +0.41) are used to compute the combination of two causal effects on Alarm (Burglary→Alarm, and Earthquake→Alarm). Conversely, in BBN, a combination of multiple causal effects on an effect variable is not computed using a specific formula. Instead, the combination effect is estimated by the domain expert based on his/her own experience (i.e., his/her own belief). Moreover, the number of probability values to be determined by the expert is equal to the product of the number of states in each parent node and the number of states in the child node. As shown in Fig. 2, the combination of causal effects from Burglary and Earthquake on Alarm requires eight (2*2*2) probability values to be specified in the CPT of Alarm. This is because there are two parent nodes (Burglary and Earthquake) and a child node (Alarm), and each of these consists of two states.

From the comparison above, FCM provides a simpler and more intuitive graphical interface than BBN. This is because the tabular interface represented by the CPTs in BBN is less intuitive, and gives an implicit dependency structure which is more difficult to understand. In addition, the level of abstraction offered to the expert is higher in FCM than in BBN. A lower level of abstraction provides more complex details, which are useful for technical purposes but are not suitable to be used at the front-end stage of the development of causal knowledge systems. This means that FCM is a more user-friendly framework for model construction, as it is less laborious and complicated for domain experts to work on. Domain experts assign less information in the construction of a causal model using FCM than using BBN.

3.2 Modularity and scalability of the causal model

The addition or removal of a causal relationship is a common task after a causal model has been constructed by a domain expert. A good framework allows the composition or decomposition of a model in an easy and simple way. Moreover, the ability to handle the growth of a model is another criterion for a good framework [22].

The fact that the causal relationships in FCM are independent of each other allow us to compose or decompose the causal model easily, without requiring the domain expert to re-specify the existing causal strengths. The composition of causal relationships enables the combination of individual causal relationships into a larger causal map, while decomposition allows the resolution of the causal map into individual causal relationships. Both processes can be done without the need for re-specifying any causal strength. For example, in Fig. 1, the initial causal strengths for Burglary→Alarm and Earthquake→Alarm are 0.76 and 0.41 respectively. These causal strengths will not be affected when the two causal relationships are joined by Alarm, which is Burglary→Alarm← Earthquake. A collaboration between multiple domain experts is often needed to construct a complex causal map. Combining the knowledge of multiple domain experts and keeping alterations to the existing causal model to a minimum are the primary concerns in current research in this area. The flexibility of FCM in allowing this composition or decomposition without modifying any causal values is extremely useful in merging several sub-causal maps into a complete causal model. Hence, it can support a collaborative development environment, which allows domain experts from different regions to share knowledge, brainstorm, and generally work together to carry out more complicated tasks. The possibility of extending a causal model without re-specifying the strength of the causal relationships is attributed to the high modularity of FCM; it frees a domain expert from the need to complete a complex causal map on his/her own. This in turn supports the separation of concerns when constructing an FCM. A knowledge engineer can separate an FCM into distinct sub-maps, distribute them to various domain experts, and finally combine these together into a single unit.

However, in BBN, the causal strength of the existing causal relationships, in terms of conditional probability distributions, will be affected and needs to be re-specified by the domain expert when there is composition/decomposition of causal relationships. Unlike FCM, BBN defines the strength of a combination of causal effects, rather than each individual effect. This increases the difficulty of the composition/decomposition process, since the service of a domain expert is required for each combination of causal effects (i.e., each time a causal relationship is added or removed). Combining the knowledge of multiple domain experts is barely supported in BBN. For example, when the two existing causal relationships Burglary→Alarm and Earthquake→Alarm are joined by Alarm, forming Burglary→Alarm←Earthquake, the existing CPT of Alarm cannot easily be re-evaluated. Instead, the CPT needs to be re-specified by the domain expert, as in Fig. 2. This is because the CPT of Alarm before the merger is determined by an effect from a single cause (Burglary or Earthquake), and after the merger it is determined by a combination of effects from two causes (Burglary and Earthquake). Composition/decomposition thus becomes an arduous process in BBN due to a lack of modularity.

In FCM, causal relationships are assumed to be mutually independent. This means that the addition or removal of a causal relationship does not affect the strength of the other causal relationships in the existing causal model. For example, when it is discovered that a low battery or an inconsistent power supply may trigger the alarm, a new causal relationship PowerSupply→Alarm can be added into the FCM in Fig. 1. To accommodate this additional causal relationship, the domain expert needs only to specify the causal strength of PowerSupply→Alarm, which is +0.4. The causal strengths of the other existing causal relationships remain intact. The domain expert is therefore freed from the burden of estimating and specifying the combined effect of the new causal relationship, PowerSupply→Alarm, with the other existing causal relationships. The combination of causal effects in FCM can be automatically calculated in the inference process. The basic assumption of the mutual independence between multiple causal effects is a result of the modularity found in FCM. This is the foundation of the high scalability of FCM.

Unlike FCM, multiple causal relationships in BBN for an effect variable are combined and represented as a conditional probability distribution, captured in the CPT attached to it. The combined effect of multiple causal relationships on an effect variable is given as a joint event for the effect variable. This means the individual causal relationships for an effect variable are not mutually independent; the probability distributions for an effect variable are affected by the addition or removal of a causal relationship. Hence, the conditional probability distributions of the effect variable need to be re-evaluated and re-specified by the domain expert whenever there is an addition or removal of a causal relationship to or from the existing causal model. The job of re-evaluation and re-specification of the CPT of an effect variable is tedious and time-consuming, especially when a large number of multiple causal effects are applied to the effect variable. For example, in Fig. 2, the CPT of Alarm needs to be re-evaluated and re-specified if a new causal relationship PowerSupply→Alarm is added into the BBN, and the domain expert then has to re-evaluate the combination effect of Burglary, Earthquake and PowerSupply on Alarm. The lack of scalability in BBN is due to the fact that it is less modular than FCM.

The addition of one or more causal relationships in FCM or BBN increases the complexity of the causal model, in terms of the number of values needed to specify a combination causal strength. In FCM, only one value is needed to specify a causal strength when a causal relationship is added. For example, in Fig. 1, the alarm may be triggered by pets moving around a house, and a new causal relationship Pets→Alarm is therefore added into the FCM. Only one additional value is needed to specify the causal strength of the newly added causal relationships, which is 0.3. The complexity of FCM grows linearly with the number of causal relationships added.

In BBN, the complexity of specifying the causal strength of a combination of multiple causal effects grows exponentially with number of individual causal effects added to the effect variable. For example in Fig. 2, the combination causal effect Burglary→Alarm←Earthquake is represented as a 2³ CPT attached to Alarm. However, the CPT of Alarm is expanded to 2⁴ when a new causal relationship between two Boolean variables is added (Pets→Alarm). The domain expert must then re-specify the 16 probability values for the combination causal effect. This is a burdensome task for the domain expert, since the size of CPT grows exponentially due to the lack of scalability in BBN.

3.3 Expressiveness in representation

The value of a framework in causal knowledge representation is mainly determined by its expressive power. The expressiveness of a representation framework is determined by how flexible it is in allowing the expert to specify the relevant knowledge he/she wishes to express, and in particular, how flexible it is in allowing some knowledge to be left unspecified [23].

In FCM, the domain expert has no control over the prior likelihood of a variable. The likelihood of each variable is predetermined based on the equal likelihood principle before there is any evidence of an increase or decrease of an event. However, in BBN, the domain expert is able to assign an initial probability to each state of a variable. By default, the variable states have equal prior probability, which means they are equally likely to happen. However, the prior probability of the states can be adjusted if they are later found to have a different likelihood. For example, in Fig. 2, the prior probability of T and F states in Burglary (0.001 and 0.999 respectively) and Earthquake (0.002 and 0.998 respectively) are different.

On the other hand, the domain expert determines the individual causal effects of each causal relationship in FCM. The combined causal effect can be obtained later in the inference process by using a certain formula. Hence, the domain expert must identify the causal value for each individual causal effect, and ignorance or uncertainty in each individual causal effect is not permitted. For example, in Fig. 1, an expert must assign a causal strength to each causal link without exception. The domain expert must specify the influence of Burglary on Alarm by providing the sign and magnitude (+0.76). Conversely, the domain expert in BBN must estimate the total strength of a combination of multiple causal effects without a need to know and specify their individual causal strengths. This offers a great convenience to the expert, as very often the expert is uncertain or even ignorant of the causal weights of the individual effects or the formula for their combination. For example, in Fig. 2, an expert estimates the probability that the alarm will be triggered as 0.95 and that it will not be triggered as 0.05, when both burglary and earthquake have happened. However, the expert may not know whether a burglary or earthquake is likely to have a stronger influence on triggering the alarm; what their respective influences are; how these individual influences are combined; or what formula should be used to compute the combined causal effect.

In FCM, only continuous numerical variables are allowed. The range of these numerical values represents the levels/states of the variable. In BBN, the number of states of a discrete variable has to be greater than or equal to two; a single-state variable is not allowed, as it indicates a constant value rather than an uncertainty. A BBN with multiple states is shown in Fig. 3 below. It supports various types of variables such as discrete/continuous numerical variables (e.g. (1, 2, 3 ... 99, 100) / [1 ... 100]), Boolean (True/False), categorical (e.g. apple/orange/grape) and ordered variables (e.g. low/medium/high). In FCM, a variable is able to represent the Boolean and ordered types in a numerical format. For example, an ordered variable (e.g. low/high) can be transformed into a numerical sub-range in FCM (e.g. 1-100). It is also notable that the discrete states contained in the categorical format need to be separated into multiple continuous numerical variables, as shown in Fig. 4.

Fig.3

An example of a BBN with multiple discrete states.

Fig.4

An example of an FCM converted from the BBN in Fig. 3.

Notably from Figs. 3 and 4 that a single causal relationship between two BBN variables (i.e., Activity→CalorieBurning) has been resolved into multiple causal relationships in FCM (i.e., Sleeping→CalorieBurning, Walking→CalorieBurning, and Running→CalorieBurning). Therefore, it can be concluded that a BBN causal relationship can represent more causal information between two variables than an FCM can.

The relative direction of change (positive or negative) between two connected variables is another important aspect for discussion. In the context of continuous numerical variables where their increase or decrease is the major concern, FCM does not allow these to be both positively and negatively related with different likelihoods. For example, in Fig. 1 there is a positive effect between Alarm and JohnCalls, which indicates that an increase in Alarm will cause an increase in JohnCalls, and a decrease in Alarm will cause a decrease in JohnCalls. However, it does not indicate how an increase in Alarm may also cause a decrease in JohnCalls, or how a decrease in Alarm may cause an increase in JohnCalls. In BBN, instead of specifying the relative direction of change between two variables, the probabilistic relationships between the possible states of these variables are specified. Moreover, it is possible for the two variables to be both positively and negatively related although with different probability values. This is because there can be a number of state-to-state causal relationships between two BBN variables, which have opposite directions of change. When there is an increase in the probability of a particular state of the cause variable, the direction of change in terms of probability varies with the state of the effect variable. The same is true when there is a decrease in probability of the said state. For example, in Fig. 2, although there is only one causal link between Alarm and JohnCalls, four possible state-to-state causal relationships are represented (Alarm = T → JohnCalls = T, Alarm = T → JohnCalls = F, Alarm = F → JohnCalls = T, Alarm = F → JohnCalls = F). These four state-level causal relationships can be positive or negative, yet they are represented by a single variable-to-variable causal relationship between Alarm and JohnCalls. Hence, the direction of influence from Alarm to JohnCalls can be a positive (promoting) effect and a negative (inhibitory) effect with different probabilities, and these can be determined by examining the probability distribution captured in the CPT of the effect node.

Another distinction between the two frameworks is how well they are able to capture the temporal and dynamic behaviour of a system. FCM allows feedback cycles to support the representation of the dynamic behaviour of a casual knowledge system. Changes made to a variable will give rise to an impact on the other connected variables, which in turn will propagate the impact back to the initial variable. FCM supports the representation of the temporal aspect of a causal knowledge system through the presence of feedback loops, and the evolution of a system triggered by the occurrence of an event. BBN is an acyclic graph that does not allow feedback cycles; feedback loops would disrupt its acyclic graphical structure. In other words, BBN only supports the representation of static system behaviour, and it prohibits the representation of the temporal aspect of a causal knowledge system. For example, in Fig. 5, the co-existence of both the causal links (Prey→Food) and (Food→Prey) is allowed in FCM to denote a pair of causal effects happening in opposite directions within different time frames. From the FCM in Fig. 5, Food→Prey has a positive influence, whilst the opposite causal effect, Prey→Food, has a negative influence. A decrease in the prey population will cause an initial drop in the predator population due to a food shortage. However, the fall in the predator population will raise the prey population after a certain period of time, and eventually, the population of prey and predator will increase when a convergence is reached [24]. In BBN, the two links Prey→Predator and Predator→Prey cannot coexist, as BBN is an acyclic graph and does not represent the temporal aspect. In this sense, FCM is more expressive than BBN.

Fig.5

Prey and predator fuzzy cognitive map.

As a conclusion, FCM is assumed to be a closed world representation, whereas BBN is an open representation of a causal knowledge system. In FCM, complete knowledge of the causes and effects must be acquired and captured in the causal model, whereas in BBN, hidden or unrepresented causes are acceptable. The unavailability of certain knowledge may be due to the ignorance or uncertainty of the domain expert; it may also be caused by the complicated nature of the causal relationships. A higher or lower prior probability of a particular variable state indicates that there may be some hidden causes which are disguised in the causal system. The tolerance of BBN in allowing the domain expert to be ignorant or uncertain of the exact causes and the formula for combining the causal effects has made it a more expressive framework in causal knowledge representation. Therefore, BBN is, on the whole, more expressive than FCM, as it allows certain knowledge to be left unspecified.

3.4 Inferential capability

Causal reasoning is performed after a causal model has been constructed. The inferential capability of a causal framework is determined by the ability of the framework to identify the causes and effects, and how much implicit knowledge it can infer from the explicit causal representation [25].

BBN supports several types of reasoning mechanisms: forward prognostic reasoning, backward diagnostic reasoning and hybrid reasoning. Prognostic reasoning is used to predict the effect on certain variables when evidence is found for a cause variable, whereas diagnostic reasoning is used to identify the possible causes of an event. Hybrid reasoning is the co-existence of prognostic reasoning and diagnostic reasoning; this is because there is very often a need to trace the consequences as well as to find the possible causes of an event. For example, in Fig. 2, when there is concrete evidence that Burglary has taken place (i.e., Burglary = T), the effects can be predicted (for example, the impact on the probability of Alarm). When there is concrete evidence that Alarm has been triggered (i.e., Alarm = T), then in addition to predicting the effects of the change, the causes of the change can also be diagnosed (for example, the change in the probability of Burglary).

An example of prognostic reasoning is given as follows:

Applying the chain rule, and assuming the d-separation between Earthquake (E) and Burglary (B) without knowing the probability of Alarm, P (A): $\begin{matrix} P (A = T | B = T) \\ = P (A = T | E, B = T) P (E | B = T) P (B = T) \\ = [P (A = T | E = T, B = T) P (E = T | B = T) P (B = T)] \\ + [P (A = T | E = F, B = T) P (E = F | B = T) P (B = T)] \\ = (0.95 × 0.002 × 1) + (0.94 × 0.998 × 1) \\ = 0.94 \end{matrix}$

An example of diagnostic reasoning is as follows:

Applying Bayes’ rule to obtain the probability that Burglary is true, given that Alarm is true: $P (B = T | A = T) = P (A = T | B = T) P (B = T) / P (A = T)$

However, P (A = T) cannot be obtained immediately from the CPTs. By using the conditioning rule, P (A = T) can be calculated as follows: $\begin{matrix} P (A = T) \\ = P (A = T | B = T, E = T) + P (A = T | B = T, E = F) \\ + P (A = T | B = F, E = T) + P (A = T | B = F, E = F) \\ = (0.95 × 0.001 × 0.002) + (0.94 × 0.001 × 0.998) \\ + (0.29 × 0.999 × 0.002) + (0.001 × 0.999 × 0.998) \\ = 0.0025 \end{matrix}$

When all the probability values are ready, they are combined to obtain the result: $\begin{matrix} P (B = T | A = T) = 0.94 \times 0.001 / 0.0025 \\ = 0.376 \end{matrix}$

Table 1 summarises the results of the probability for each variable state after the evidence is set. From the table, when there is evidence that Burglary has taken place, the probability that Alarm will be triggered is 0.94, higher than the prior probability, 0.0025, when this evidence is not found. The probability that JohnCalls will occur is 0.849, which is also higher than the prior probability, 0.0521. However, the variable Earthquake will not be affected by the increase in Burglary; this is because Burglary and Earthquake are d-separated from each other when evidence is found for Burglary or Earthquake. Now suppose there is evidence that Alarm has been triggered; the probability that JohnCalls will occur is 0.9, higher than the prior probability, 0.0521. It is also noticeable that the probabilities that Burglary and Earthquake will occur are 0.3736 and 0.2310 respectively, higher than their respective prior probabilities, 0.001 and 0.002. This is an indication that the increase in Alarm is most likely caused by the increase in Burglary and Earthquake—a diagnostic reasoning.

Table 1
Simulation results of BBN reasoning

Burglary Earthquake Alarm JohnCalls MaryCalls

No Evidence (T)0.001 (T)0.002 (T)0.0025 (T)0.0521 (T)0.0117

(F)0.999 (F)0.998 (F)0.9975 (F)0.948 (F)0.9883

Evidence: B urglary = T (T)1.000 (T)0.002 (T)0.94 (T)0.849 (T)0.6586

(F)0.000 (F)0.998 (F)0.06 (F)0.151 (F)0.3414

Evidence: JohnCalls = T (T)0.3736 (T)0.2310 (T)1.000 (T)0.900 (T)0.700

(F)0.6264 (F)0.7690 (F)0.000 (F)0.100 (F)0.300

	Burglary	Earthquake	Alarm	JohnCalls	MaryCalls
No Evidence	(T)0.001	(T)0.002	(T)0.0025	(T)0.0521	(T)0.0117
	(F)0.999	(F)0.998	(F)0.9975	(F)0.948	(F)0.9883
Evidence: B urglary = T	(T)1.000	(T)0.002	(T)0.94	(T)0.849	(T)0.6586
	(F)0.000	(F)0.998	(F)0.06	(F)0.151	(F)0.3414
Evidence: JohnCalls = T	(T)0.3736	(T)0.2310	(T)1.000	(T)0.900	(T)0.700
	(F)0.6264	(F)0.7690	(F)0.000	(F)0.100	(F)0.300

FCM, on the other hand, only supports forward prognostic reasoning. The reasoning mechanism used in FCM, which is iterative vector-matrix multiplication followed by a threshold, does not offer a backward diagnostic reasoning capability [26]. Hence, FCM can only answer ‘what-if’ questions rather than ‘why’ questions. For example, in Fig. 1, when there is evidence that Burglary has increased, the information can be represented as an input vector [1, 0.5, 0.5, 0.5, 0.5]. Multiplying this with the adjacency matrix in the figure, called M, an output vector [0.00, 0.00, 0.97, 0.50, 0.44] is obtained. To ensure that these values are within the range [0, 1], the output vector will be confined using a sigmoid function, as in Equation (1) below. The most commonly used non-linear functions are the step function for bivalent 0, 1 or trivalent -1, 0, 1 concepts and the sigmoid function. However, the sigmoid function is chosen here since it is continuous rather than discrete. $f (x) = \frac{1}{1 + e^{{kx}^{'}}}$ (1) where k controls the saturation rate of the dynamic process. In our example, k is set to 3.

The output vector is then adjusted by holding Burglary = 1, which yields [1.00, 0.50, 0.95, 0.82, 0.79]. The vector is then taken as an input vector for the next step. The process is repeated, as shown below, until we get the output vector obtained above. $\begin{matrix} [1.00, 0.50, 0.50, 0.50] \times M = [0.00, 0.00, 0.97, \\ 0.50, 0.44] \\ Sigmoid \to [0.50, 0.50, 0.95, 0.82, 0.79] \to \\ [1.00, 0.50, 0.95, 0.82, 0.79] \end{matrix}$ $\begin{matrix} [1.00, 0.50, 0.95, 0.82, 0.79] \times M = [0.00, 0.00, \\ 0.97, 0.95, 0.83] \\ Sigmoid \to [0.50, 0.50, 0.95, 0.94, 0.92] \to \\ [1.00, 0.50, 0.95, 0.94, 0.92] \end{matrix}$ $\begin{matrix} [1.00, 0.50, 0.95, 0.94, 0.92] \times M = [0.00, 0.00, \\ 0.97, 0.95, 0.83] \\ Sigmoid \to [0.50, 0.50, 0.95, 0.94, 0.92] \to \\ [1.00, 0.50, 0.95, 0.94, 0.92] \end{matrix}$

The final result [1.00, 0.50, 0.95, 0.94, 0.92] can be interpreted as: when the Burglary increases by 1, the triggering of Alarm will increase by 0.95, JohnCalls will increase by 0.94, MaryCalls will increase by 0.92 and Earthquake will remain at 0.5. When there is evidence that the Alarm has been triggered, the information can be represented as an input vector [0.50, 0.50, 1.00, 0.50, 0.50]. Multiplying this with the adjacency matrix M, we obtain an output vector [0.00, 0.00, 0.00, 1.00, 0.88]. The output vector is then confined using a sigmoid function and ad usted by holding Alarm = 1, which yields [0.50, 0.50, 1.00, 0.95, 0.93]. The adjusted vector is then taken as an input vector for the next step. The process is repeated, as shown below. $\begin{matrix} [0.50, 0.50, 1.00, 0.50, 0.50] \times M = [0.00, 0.00, \\ 0.59, 1.00, 0.88] \\ Sigmoid \to [0.50, 0.50, 0.85, 0.95, 0.93] \to \\ [0.50, 0.50, 1.00, 0.95, 0.93] \end{matrix}$ $\begin{matrix} [0.50, 0.50, 1.00, 0.95, 0.93] \times M = [0.00, 0.00, \\ 0.59, 1.00, 0.88] \\ Sigmoid \to [0.50, 0.50, 0.85, 0.95, 0.93] \to \\ [0.50, 0.50, 1.00, 0.95, 0.93] \end{matrix}$

The final result [0.50, 0.50, 1.00, 0.95, 0.93] can be interpreted as: when the triggering of Alarm increases by 1, JohnCalls and MaryCalls will increase by 0.95 and 0.93 respectively. The increase in JohnCalls is due to the impact of the increase in the triggering of Alarm, in the forward direction. However, the increase in Alarm does not affect Burglary and Earthquake, even though they are causes of Alarm. This is because the reasoning mechanism using the iterative vector-matrix multiplication is unable to diagnose the possible causes for the increase in Alarm. The only reasonable explanation for the change in Alarm without a corresponding change in its causes, Burglary and Earthquake, is that the change arises from external variable(s). In general, BBN supports more types of reasoning mechanisms, and has a stronger inferential capability, than FCM.

3.5 Rigour, formality and preciseness

Software/knowledge engineering involves many contributors with different skill-sets, goals, and interests. Without standardisation, each participant imposes his/her own interests on a project. When problems occur, they become difficult to resolve and often result in unproductive conflicts [27]. Rigour is the level of discipline imposed through the application of standard rules, which are used as a guide for the execution of a process. It facilitates the process and helps to produce products with higher reliability and greater quality. Meanwhile, formality is the highest level of rigour, whereby software/knowledge systems can be verified by mathematical laws. It increases the maintainability, reusability, portability, understandability and interoperability of the final product [28]. The formality of a knowledge representation system prevents vagueness in semantics and helps to enhance the preciseness and robustness of the causal knowledge representation system. Unfortunately, the advantages of rigour and formality cannot be applied universally [29]. For example, the advantages to the back-end representation and automated reasoning may become disadvantages to the front-end modelling. Soundness in inference is ensured by unambiguous semantics in a knowledge representation system. A sound inference mechanism will produce valid results given a true hypothesis. In the context of causal reasoning using FCM or BBN, soundness in inference is determined by the correctness of the results inferred from a given causal model in response to the stimulus to certain variables. Stimuli in FCM and BBN are slightly different; a stimulus in FCM is represented by a change (increase or decrease) in some variable, whereas in BBN it is represented by an assignment of a probability value to some variable.

Standardisation in representation ensures rigour, formality and preciseness in semantics by reducing ambiguity and inconsistency. In BBN, the level of a variable state is represented universally as a probability value in the range [0, 1] or in terms of a percentage [0, 100]. In FCM, the level of a variable is represented as a fuzzy value. Different researchers may use different ranges of values to represent the levels of variables; typical ranges adopted by researchers are [0, 1] [30, 31], [0, 100] [32] and [-1, 1] [33, 34], and the initial values are normally set at 0.5, 50, and 0 respectively (i.e., the middle points). Each initial value is used as a reference point, where a variation from this determines how much the level of a variable has increased or decreased after each cycle is completed. The lack of a standard format for representing variable levels has made FCM less rigorous, less formal and less precise than BBN.

Apart from representation in a causal model, the rigour, formality and preciseness of inference method form another important issue, which ensures the soundness of inference. In BBN, causal values represented in the CPTs, and all the numeric values inferred through Bayesian reasoning, are interpreted as probabilities. Each of these values indicates the probability that an event will occur given some evidence that some other event has occurred. For example, in the CPT for Alarm, the values 0.95 and 0.05 in Fig. 2 are the probabilities that Alarm will be triggered and not triggered, respectively, given evidence that Burglary and Earthquake have taken place. In Table 1, the values 0.94 and 0.06 are the probabilities that Alarm is true and false, respectively, given evidence that Burglary has occurred. Bayesian reasoning is based on strong mathematical theorems derivable from well-defined basic axioms. The evidence propagation mechanism based on probability theory is adopted by all researchers as the standard inference method. It has been improved for efficiency purposes based on conditional independence theory, which allows d-separation, hence reducing the search space and inference time. The basic foundation has contributed to the soundness of Bayesian reasoning, since the outcomes inferred through BBN can be proven and the soundness ensured.

FCM has no well-founded underlying theory for the standard semantic interpretation of the numeric values represented by the adjacency matrix and those inferred from the vector-matrix multiplication. The numeric values do not represent the measurement of any physical quantity; they are merely linear scale factors for the grading of some abstract quantities, such as the impact of Earthquake occurring. As shown in Fig. 1, the values 0.76, 0.41, 1, and 0.88 are not associated with any specific physical quantity; instead these are linear scale factors for grading the strength of a causal effect between two variables. Furthermore, the inference mechanism in FCM usually involves two significant steps: a matrix multiplication based on linear proportionality, and a conversion based on the transfer function. Unfortunately, an inference mechanism based on vector-matrix multiplication is rather ad hoc in some aspects, and the selection of linear proportionality is not formally and rigorously justified. Moreover, various transfer functions are adopted by researchers for converting the output from the matrix multiplication into a specific range of values. The selection of a specific transfer function is rather subjective without a rigorous/formal justification. Some commonly used transfer functions are step functions for bivalent 0, 1 or trivalent -1, 0, 1 concepts, a sigmoid function and a linear threshold function [35 –37]. A step function for bivalent concepts allows the value to be either 0 or 1; a value of more than 0 is represented as 1, and value less than or equal to 0 is represented as 0. A trivalent function, on the other hand, allows the value to be -1 (for values less than 0), 0 (for value equal to 0), or 1(for values more than 0). A sigmoid function is used for a continuous range of values between -1 and 1. The selection of the sigmoid function to limit the range of possible values and the choice of the k factor are also rather ad hoc in FCM. Given a value x for the sigmoid function, the value f (x) will be different for different k factors. For example, given the value x = 0.5, and calculating the value f (x) using Equation (1), the value of 0.62 is obtained by setting k = 1, and 0.82 by setting k = 3. The selection of the k factor to determine the value f (x) is rather subjective. The step function for bivalent and trivalent cases, and the sigmoid function, are the most common non-linear functions used by researchers. However, non-linear functions are not used in some FCM research works [38]. Instead, a linear threshold function is used to truncate to 1 any values greater than 1 [39]. Any matrix multiplication result that is above the threshold value (i.e., 1) is arbitrarily truncated to the threshold value. For instance, in Fig. 1, if Burglary and Earthquake have increased by 1, the vector matrix multiplication yields [1, 1, 0.5, 0.5, 0.5] × M [0.00, 0.00, 1.17, 0.50, 0.50]. Without using the sigmoid transfer function, the value for Alarm, 1.17, is truncated to 1. Ambiguous and inconsistent outcomes will be produced by simply truncating the exceeding values to the threshold, because this violates the linear relationship between the input and output values, and is done arbitrarily without a proper justification. A sigmoid function, as a non-linear function, is usually selected by researchers rather than linear functions, because a non-linear function is a more natural representation of events taking place in real life. Nonetheless, the outcomes of the inference are different when different transfer functions are selected, even though the same FCM causal model is used. The lack of a standard inference method causes inconsistency and ambiguity in the outcomes.

3.6 Summary of the evaluation results

The evaluation results from the previous sections are summarised in Table 2 and are discussed below.

Table 2
Summary of the evaluation results

Comparison Criteri on BBN FCM

Understandability and Usability in Modelling A causal effect is implicitly represented as multiple probability values in the conditional probability table. A causal effect is explicitly represented as a single fuzzy value attached to the causal link in graphical model.

A variable is made up of two or more variable states. A variable is simply understood as a single state by itself.

A causal relationship is represented as the probabilistic dependencies between variable states. A causal relationship is represented as a causal link between two variables.

It is difficult to identify whether a causal effect between two variables is promoting or inhibitory by simply looking at the multiple probability values in the CPT. It is easy to identify whether a causal effect between two variables is promoting or inhibitory by simply looking at the sign of the causal weight attached to the causal link.

It is less intuitive because the causality is hidden in the CPTs. It is more intuitive because the causality is visible in the graphical model.

The essential component to be constructed is a set of CPTs. The essential component to be constructed is a signed fuzzy digraph.

Domain experts need to determine more domain elements such as variables, variable states and causal effects between variable states. Domain experts need to determine fewer domain elements such as variables and the causal effects between variables.

Domain experts need to estimate and specify the combined causal effect, and the number of probability values to be determined is equal to the product of the number of possible states of the individual cause and effect variables. Domain experts need to specify individual causal effects and the number of causal values needed to be determined is simply the number of causal relationships itself.

The construction of BBN is a more laborious and time-consuming task because domain experts need to specify causality in a more detailed level. The construction of FCM is simpler and requires less effort and time because domain experts work at a higher level of abstraction.

There are many user-friendly and commercially available integrated development environments to support the construction of causal models, such as Netica, Hugin, BayesiaLab, etc. Only research-based tools without friendly user interfaces are available for the construction of the causal model, such as FCMapper, FCM Tool, etc.

Modularity and Scalability in Model Construction The existing causal model will be affected when a causal effect is added/removed to/from a particular node. The existing causal model remains intact when a causal effect is added/removed to/from a particular node.

The causal strength of the combination effect needs to be re-specified when two existing causal relationships are joined by their common target node. The causal strengths of the causal relationships do not need to be re-specified when two existing causal relationships are joined.

Composition or decomposition of causal models is difficult. Composition or decomposition of causal models is easy.

The number of values needed to specify a causal strength increases exponentially with the number of causal relationships that are added to a common target node. The number of values needed to specify a causal strength increases linearly with the number of causal relationships that are added to a common target node.

The lack of modularity and scalability has hindered model reuse. The high modularity and scalability has promoted model reuse.

Expressiveness in Representation The user is allowed to determine the prior probability of every state in a variable before any evidence is set. The user has no control over the initial likelihood of a variable before any evidence is set.

The prior probability of every state is allowed to be unequal. The initial likelihood of every variable is assumed to be equal.

It allows hidden causes that are not represented. It does not allow hidden causes and all possible causes are assumed to be represented.

The combination of the causal effects needs to be specified and ignorance/uncertainty in the individual effects is allowed. The combination of the causal effects is computed in the inference process and ignorance/uncertainty in the individual effects is not allowed.

The formula used to combine the individual causal effects can be ignored. Instead, an estimation of the total effect can be obtained in the absence of a precise formula. The formula used to combine the individual causal effects needs to be recognised. The total effect is computed as the algebraic sum of the individual causal strengths based on a certain formula.

It supports different variable types, such as discrete numerical, continuous numerical, Boolean, categorical, and ordered variables It supports only continuous numerical variables

A variable in BBN can represent several categories, which are treated as the different states in a variable. A variable in FCM can only represent one category (state) in a BBN’s categorical variable.

It allows two variables to be both positively and negatively related within a single causal relationship. It does not allow two variables to be both positively and negatively related within a single causal relationship. A causal relationship can be either positive (a promoting effect) or negative (an inhibitory effect) but not both at the same time.

More causal information can be obtained from a variable-to-variable causal relationship because a variable represents several states. Limited causal information can be obtained from a variable-to-variable causal relationship because a variable represents a single state by itself.

It is an acyclic graph, which does not allow feedback and causal loops. It is a cyclic graph with feedback.

It only supports the modelling of static systems, and temporal representation is not allowed. It is meant to support the modelling of dynamic systems.

Inferential Capability Various types of reasoning mechanisms such as diagnostic, prognostic and hybrid can be used since BBN supports both forward chaining and backward chaining. Only prognostic reasoning can be used since FCM does not support backward chaining.

More implicit causal knowledge can be inferred from the causal model because it supports more inference mechanisms. Less implicit causal knowledge can be inferred from the causal model due to its limited inference mechanisms.

Pearl’s conditional independence rule has been proposed to reduce the complexity of the causal propagation of events. There is no proposed mechanism to reduce the complexity of iterative vector-matrix multiplication until convergence.

Rigour, Formality and Preciseness Domain experts use a standard format and range for representing the level of a variable (probability value). Domain experts use different formats and ranges for representing the level of a variable based on their own preference. (0 to 100; -1 to 1; 0 to 1; etc.).

It is founded on sound mathematical theorems derivable from well-defined basic axioms in probability theory. There is no well-founded underlying theory for the standard semantic interpretation of the numeric values represented in the adjacency matrix and those inferred from the vector-matrix multiplication.

It has a generally adopted inference mechanism based on conditional probability and the Bayes rule. The inference mechanism varies depending on its application.

The inference steps and the methods used in the inference process are rigorously and formally justified. The inference steps, methods used in the inference process, as well as the selection of the transfer function are rather subjective without a rigorous justification.

The correctness of the inference process is provable. The inference process is rather ad hoc. The correctness cannot be proven.

Comparison Criteri on	BBN	FCM
Understandability and Usability in Modelling	A causal effect is implicitly represented as multiple probability values in the conditional probability table.	A causal effect is explicitly represented as a single fuzzy value attached to the causal link in graphical model.
	A variable is made up of two or more variable states.	A variable is simply understood as a single state by itself.
	A causal relationship is represented as the probabilistic dependencies between variable states.	A causal relationship is represented as a causal link between two variables.
	It is difficult to identify whether a causal effect between two variables is promoting or inhibitory by simply looking at the multiple probability values in the CPT.	It is easy to identify whether a causal effect between two variables is promoting or inhibitory by simply looking at the sign of the causal weight attached to the causal link.
	It is less intuitive because the causality is hidden in the CPTs.	It is more intuitive because the causality is visible in the graphical model.
	The essential component to be constructed is a set of CPTs.	The essential component to be constructed is a signed fuzzy digraph.
	Domain experts need to determine more domain elements such as variables, variable states and causal effects between variable states.	Domain experts need to determine fewer domain elements such as variables and the causal effects between variables.
	Domain experts need to estimate and specify the combined causal effect, and the number of probability values to be determined is equal to the product of the number of possible states of the individual cause and effect variables.	Domain experts need to specify individual causal effects and the number of causal values needed to be determined is simply the number of causal relationships itself.
	The construction of BBN is a more laborious and time-consuming task because domain experts need to specify causality in a more detailed level.	The construction of FCM is simpler and requires less effort and time because domain experts work at a higher level of abstraction.
	There are many user-friendly and commercially available integrated development environments to support the construction of causal models, such as Netica, Hugin, BayesiaLab, etc.	Only research-based tools without friendly user interfaces are available for the construction of the causal model, such as FCMapper, FCM Tool, etc.
Modularity and Scalability in Model Construction	The existing causal model will be affected when a causal effect is added/removed to/from a particular node.	The existing causal model remains intact when a causal effect is added/removed to/from a particular node.
	The causal strength of the combination effect needs to be re-specified when two existing causal relationships are joined by their common target node.	The causal strengths of the causal relationships do not need to be re-specified when two existing causal relationships are joined.
	Composition or decomposition of causal models is difficult.	Composition or decomposition of causal models is easy.
	The number of values needed to specify a causal strength increases exponentially with the number of causal relationships that are added to a common target node.	The number of values needed to specify a causal strength increases linearly with the number of causal relationships that are added to a common target node.
	The lack of modularity and scalability has hindered model reuse.	The high modularity and scalability has promoted model reuse.
Expressiveness in Representation	The user is allowed to determine the prior probability of every state in a variable before any evidence is set.	The user has no control over the initial likelihood of a variable before any evidence is set.
	The prior probability of every state is allowed to be unequal.	The initial likelihood of every variable is assumed to be equal.
	It allows hidden causes that are not represented.	It does not allow hidden causes and all possible causes are assumed to be represented.
	The combination of the causal effects needs to be specified and ignorance/uncertainty in the individual effects is allowed.	The combination of the causal effects is computed in the inference process and ignorance/uncertainty in the individual effects is not allowed.
	The formula used to combine the individual causal effects can be ignored. Instead, an estimation of the total effect can be obtained in the absence of a precise formula.	The formula used to combine the individual causal effects needs to be recognised. The total effect is computed as the algebraic sum of the individual causal strengths based on a certain formula.
	It supports different variable types, such as discrete numerical, continuous numerical, Boolean, categorical, and ordered variables	It supports only continuous numerical variables
	A variable in BBN can represent several categories, which are treated as the different states in a variable.	A variable in FCM can only represent one category (state) in a BBN’s categorical variable.
	It allows two variables to be both positively and negatively related within a single causal relationship.	It does not allow two variables to be both positively and negatively related within a single causal relationship. A causal relationship can be either positive (a promoting effect) or negative (an inhibitory effect) but not both at the same time.
	More causal information can be obtained from a variable-to-variable causal relationship because a variable represents several states.	Limited causal information can be obtained from a variable-to-variable causal relationship because a variable represents a single state by itself.
	It is an acyclic graph, which does not allow feedback and causal loops.	It is a cyclic graph with feedback.
	It only supports the modelling of static systems, and temporal representation is not allowed.	It is meant to support the modelling of dynamic systems.
Inferential Capability	Various types of reasoning mechanisms such as diagnostic, prognostic and hybrid can be used since BBN supports both forward chaining and backward chaining.	Only prognostic reasoning can be used since FCM does not support backward chaining.
	More implicit causal knowledge can be inferred from the causal model because it supports more inference mechanisms.	Less implicit causal knowledge can be inferred from the causal model due to its limited inference mechanisms.
	Pearl’s conditional independence rule has been proposed to reduce the complexity of the causal propagation of events.	There is no proposed mechanism to reduce the complexity of iterative vector-matrix multiplication until convergence.
Rigour, Formality and Preciseness	Domain experts use a standard format and range for representing the level of a variable (probability value).	Domain experts use different formats and ranges for representing the level of a variable based on their own preference. (0 to 100; -1 to 1; 0 to 1; etc.).
	It is founded on sound mathematical theorems derivable from well-defined basic axioms in probability theory.	There is no well-founded underlying theory for the standard semantic interpretation of the numeric values represented in the adjacency matrix and those inferred from the vector-matrix multiplication.
	It has a generally adopted inference mechanism based on conditional probability and the Bayes rule.	The inference mechanism varies depending on its application.
	The inference steps and the methods used in the inference process are rigorously and formally justified.	The inference steps, methods used in the inference process, as well as the selection of the transfer function are rather subjective without a rigorous justification.
	The correctness of the inference process is provable.	The inference process is rather ad hoc. The correctness cannot be proven.

In terms of understandability and usability, FCM is in general far superior to BBN. The user-friendliness and intuitiveness of the visual graphical interface in FCM has allowed domain experts to work at a higher level of abstraction, and this can help them to focus on more vital aspects by hiding the lower-level details. In terms of modularity and scalability, FCM is also better than BBN, in the sense that it is less laborious and complicated for domain experts to combine or separate a causal model by adding or removing causal relationships. Conversely, in BBN, specifying causal knowledge in CPTs is an unnatural and tedious task, especially when the size of the CPT is large.

In terms of expressiveness, BBN is in general far superior to FCM. The flexibility of BBN in permitting the causal knowledge to be specified in detail, and its tolerance for ignorance or uncertainty in the causal knowledge, makes it a more expressive framework for causal knowledge representation. For inferential capability, BBN is far superior to FCM because it supports a variety of reasoning mechanisms, and thus more implicit knowledge can be inferred. The efficient evidence propagation mechanism based on conditional independence in BBN further strengthens its inferential capability. By evaluating the rigor, formality and preciseness of both frameworks, BBN has proven to be more rigorous in representation, and more formal and sound in reasoning, due to its solid foundation in probability theory. Its inferential capability, together with its soundness in inference, has made it a mature framework which is widely adopted in various domains, with a proven track record in industry-scale applications.

4 Conclusions and future work

This study has evaluated the characteristics and performances of FCM and BBN in the development of causal knowledge systems. These two causal knowledge frameworks have been successfully differentiated using a set of commonly adopted knowledge engineering criteria such as understandability, usability, modularity, scalability, expressiveness, inferential capability, rigour, formality and preciseness. The research findings show that FCM is more suitable for use as a front-end modelling tool to elicit expert knowledge, since the causal model is simpler, more intuitive and easier to compose and decompose. BBN, on the other hand, is more suitable for use in back-end representation and automated reasoning, since the causal model is more expressive, formal and sound. The evaluation results suggest the possibility of integrating both frameworks so that the shortcomings of one can be effectively addressed by the other.

Future work includes the performance of an evaluation study through the use of a quantitative approach. A group of knowledge engineers and domain experts will work together on a number of real applications spanning several domains, using FCM and BBN separately. The differences in both frameworks can be observed based on a statistical analysis of the subjective opinions recorded from the knowledge engineers and the domain experts using a questionnaire. More comprehensive evaluation results for FCM and BBN will be obtained through a combination of evaluation results obtained from both qualitative and quantitative approaches.

Footnotes

Acknowledgments

This research work was supported by the Fundamental Research Grant Scheme (FRGS) from the Ministry of Education and Multimedia University, Malaysia (Project ID: FRGS/1/2018/ICT02/MMU/02/1).

References

Axelrod

Structure of decision: The cognitive maps of political elites, Princeton University Press: Princeton, NJ, 1976.

Kosko

Fuzzy cognitive maps, International Journal of Man-Machine Studies 24(1) (1986), 65–75.

Kosko

Fuzzy systems as universal approximators Proceedings of IEEE International Conference on Fuzzy Systems 1992, pp. 1153–1162.

Papageorgiou

E.I.

, Subramanian

, Karmegam

and Papandrianos

A risk management model for familial breast cancer: A new application using fuzzy cognitive map method, Computer Methods and Programs in Biomedicine 122(2) (2015), 123–135.

Papageorgiou

E.I.

, Markinos

A.T.

and Gemtos

T.A.

Fuzzy cognitive map based approach for predicting yield in cotton crop production as a basis for decision support system in precision agriculture application, Applied Soft Computing 11(4) (2011), 3643–3657.

Han

J.H.

and Kim

J.H.

Human intention reading by fuzzy cognitive map: A human robot cooperative object carrying task, Robot Intelligence Technology and Applications (2013), 127–135. Springer, Berlin, Heidelberg.

Wei

, Luo

, Li

, Zhang

and Xu

Online comment-based hotel quality automatic assessment using improved fuzzy comprehensive evaluation and fuzzy cognitive map, IEEE Transactions on Fuzzy Systems 23(1) (2015), 72–84.

Pearl

Fusion, propagation, and structuring in belief networks, Artificial Intelligence 29(3) (1986), 241–288.

Pearl

Probabilistic reasoning in intelligent systems. San Mateo, CA: Morgan Kaufmann, 1988.

10.

Smallwood

R.D.

Bayesian product recommendation engine, U.S. Patent No. 263 B2, filed September 23, 2002 and issued August 28, 2012.

11.

Schmitt

L.H.M.

and Brugere

Capturing ecosystem services, stakeholders’ preferences and trade-offs in coastal aquaculture decisions: A Bayesian belief network application, PloS one 8(10) (2013), e75956. https://dx-doi-org.web.bisu.edu.cn/10.1371/journal.pone.0075956.

12.

Quigley

J.M.

, Bryden

P.A.

, Scott

D.A.

, Kuwabara

and Cerri

Relative efficacy and safety of simeprevir and telaprevir in treatment of naïve hepatitis C-infected patients in a Japanese population: A Bayesian network meta-analysis, Hepatology Research 45(10) (2015), 89–98. 10.1111/hepr.12467.

13.

Ashouri

A.H.

, Mariani

, Palermo

, Park

, Cavazos

and Silvano

Cobayn: Compiler autotuning framework using Bayesian networks, ACM Transactions on Architecture and Code Optimization (TACO) 13(2) (2016), 21–45.

14.

PAMBAYESIAN: PAtient Managed decision-support using Bayesian networks, 2016. Retrieved from http://www.eecs.qmul.ac.uk/∼norman/projects/PAMBAYESIAN/#Press_Release_October_2016:_Queen_Mary% 3Fs.

15.

Hugin Expert Software (2018). Retrieved from <http://www.hugin.com/> Accessed 19.09.18.

16.

BayesiaLab (2018). Retrieved from <http://www.bayesia.com/introduction> Accessed 19.09.18.

17.

Netica (2018). Retrieved from <http://www.norsys.com/netica.html> Accessed 19.09.18.

18.

Liu

Z.Q.

Causation, Bayesian networks and cognitive maps, Departmental Paper, Department of Computer Science and Software Engineering, University of Melbourne: Melbourne, 2000.

19.

Douali

, Csaba

, De Roo

, Papageorgiou

E.I.

and Jaulent

M.C.

Diagnosis support system based on clinical guidelines: Comparison between case-based fuzzy cognitive maps and Bayesian networks, Computer Methods and Programs in Biomedicine 113(1) (2014), 133–143.

20.

Cheah

W.P.

, Kim

K.Y.

, Yang

H.J.

, Kim

S.H.

and Kim

J.S.

Fuzzy cognitive map and Bayesian belief network for causal knowledge engineering: A comparative study, The KIPS Transactions: PartB 15(2) (2008), 147–158.

21.

Schalles

, Creagh

and Rebstock

A causal model for analyzing the impact of graphical modeling languages on usability, International Journal of Software Engineering and Knowledge Engineering 24(09) (2014), 1337–1355.

22.

Y.J.

, Chen

X.B.

, Qi

G.N.

and Song

L.W.

Modular design involving effectiveness of multiple phases for product life cycle, The International Journal of Advanced Manufacturing Technology 66(9-12) (2013), 1475–1488.

23.

Levesque

H.J.

and Brachman

R.J.

Expressiveness and tractability in knowledge representation and reasoning, Computational Intelligence 3(2) (1987), 78–93.

24.

Prey and Predator FCM (2018) Retrieved From <http://www.ochoadeaspuru.com/fuzcogmap/preyandpredator.php> Accessed 19.09.18.

25.

Rottman

B.M.

and Hastie

Reasoning about causal relationships: Inferences on causal networks, Psychological Bulletin 140(1) (2014), 109.

26.

Papageorgiou

E.I.

and Salmeron

J.L.

A review of fuzzy cognitive maps research during the last decade, IEEE Transactions on Fuzzy Systems 21(1) (2013), 66–79.

27.

R. Van Wessel, Toward corporate IT standardization management: Frameworks and solution, Information Science Reference, Hershey, PA, USA: IGI Global, 2010.

28.

Bowen

J.P.

, Hinchey

, Janicke

, Ward

M.P.

and Zedan

Formality, agility, security, and evolution in software development, IEEE Computer 47(10) (2014), 86–89.

29.

Shipman

F.M.

III and Marshall

C.C.

, Formality considered harmful: Experiences, emerging themes, and directions on the use of formal representations in interactive systems, Computer Supported Cooperative Work (CSCW) 8(4) (1999), 333–352.

30.

Pérez

L.J.F.

, Calla

L.A.R.

, Valente

, Montenegro

A.A.

and Clua

E.W.G.

Dynamic game difficulty balancing in real time using Evolutionary Fuzzy Cognitive Maps, IEEE 14th Brazilian Symposium on Computer Games and Digital Entertainment (SBGames) (2015), 24–32.

31.

Rickard

J.T.

, Aisbett

and Yager

R.R.

A new fuzzy cognitive map structure based on the weighted power mean, IEEE Transactions on Fuzzy Systems 23(6) (2015), 2188–2201.

32.

Gray

S.R.J.

, Gagnon

A.S.

, Gray

S.A.

, O’Dwyer

, O’Mahony

, Muir

, Devoy

R.J.N.

, Falaleeva

and Gault

Are coastal managers detecting the problem? Assessing stakeholder perception of climate vulnerability using Fuzzy Cognitive Mapping, Ocean and Coastal Management 94 (2014), 74–89.

33.

Poczeta

and Yastrebov

Proceedings of IEEE International Conference in Fuzzy Systems (FUZZ-IEEE), Analysis of fuzzy cognitive maps with multi-step learning algorithms in valuation of owner-occupied homes (2014), pp. 1029–1035.

34.

Vidal

, Salmeron

J.L.

, Mena

and Chulvi

Fuzzy cognitive map-based selection of TRIZ (Theory of Inventive Problem Solving) trends for eco-innovation of ceramic industry products, Journal of Cleaner Production 107 (2015), 202–214.

35.

Tsadiras

A.K.

Comparing the inference capabilities of binary, trivalent and sigmoid fuzzy cognitive maps, Information Sciences 178(20) (2008), 3880–3894.

36.

Bueno

and Salmeron

J.L.

Benchmarking main activation functions in fuzzy cognitive maps, Expert Systems with Applications 36(3) (2009), 5221–5229.

37.

M. Glykas, Fuzzy cognitive maps: Advances in theory, methodologies, tools and applications, Springer-Verlag, Berlin, Heidelberg, 2010. 10.1007/978-3-642-03220-2

38.

Lee

K.C.

and Lee

A causal knowledge-based expert system for planning an Internet-based stock trading system, Expert Systems with Applications 39(10) (2012), 8626–8635.

39.

Kosko

, Neural networks and fuzzy systems: A dynamical systems approach to machine intelligence/book and disk, Prentice Hall, Upper Saddle River, 1992.