Abstract
Ethereum is one of the popular Blockchain platform. The key component in the Ethereum Blockchain is the smart contract. Smart contracts (SC) are like normal computer programs which are written mostly in solidity high-level object-oriented programming language. Smart contracts allow completing transactions directly between two parties in the network without any middle man or mediator. Modification of the smart contracts are not possible once deployed into the Blockchain. Thus smart contract has to be vulnerable free before deploying into the Blockchain. In this paper, Bayesian Network Model was designed and constructed based on Bayesian learning concept to detect smart contract security vulnerabilities which are Reentrancy, Tx.origin and DOS. The results showed that the proposed BNMC (Bayesian Network Model Construction) design is able to detect the severity of each vulnerability and also suggest the reasons for the vulnerability. The accuracy of the proposed BNMC results are improved (accuracy 8% increased for both Reentracy and Tx.origin, 6% increased for DOS), compared with traditional method LSTM. This proposed BNMS design and implementation is the first attempt to detect smart contract vulnerabilities using Bayesian Networks.
Introduction
Smart Contracts(SC) are the programs of predefined rules which are deployed into the Blockchain and these programs execute automatically to determine that every transaction has to satisfy the predefined conditions to complete the transaction. In a Blockchain, transactions among two parties are recorded in an efficient, verifiable, and unchangeable [1–3]. Blockchain can present an innovative solution for long-standing problems of security and data storage in centralized systems [4]. Smart Contracts worked based on conditional reasoning statements. Nowadays smart contracts have been used widely in the business among a group of untrusted persons, where every transaction can be completed according to rules agreed upon by all business stakeholders without the involvement of third-party verification [5]. Initially, SC basic codes are written in a high-level language, for example, a Solidity programming language by designers. Solidity compiler converts high-level source code into byte codes (EVM code) which is in the hexadecimal arrangement. These byte codes can be converted into EVM instructions and are called opcodes [6].
The reasons for attacks on smart contracts are classified into three categories: first, the smart contracts of Ethereum are mainly money-oriented transactions; second, deployed vulnerable smart contracts can’t be modifiable; and finally, smart contracts have no defined quality measures [7]. Many smart contract assaults in 2016 led to large money losses (multi-million dollar losses) as a result of vulnerabilities in SCs. In smart agreements at Ethereum, it’s currently worth focusing on innovative machine learning models to effectively detect SC vulnerabilities [8], especially in financial matters such as money transfers and more complicated code.
Smart contract vulnerabilities are divided into four categories 1. security, 2. functional, 3. developmental, 4. operational [9]. Security concerns include re-entrancy, external contract, DOS (Denial Of Service), the use of tx.origin, and external call unchecked. Functional vulnerabilities are locked money, integer division, integer underflow, integer overflow, unsafe interface type, and reliance on time stamps. Development concerns involve infringement of token API, private modifier, non-compiler version fixation, violation of the style guide, duplicated back function, and degree of implied visibility. Finally, operational problems include byte array and expensive loop vulnerabilities. The major focus of the proposed work is on security concerns that consist of re-entrancy, DOS, and tx.origin vulnerabilities.

Smart contracts having reentrancy vulnerability.
Bayesian Networks are significant as a unified probabilistic framework for classification. Over the past few years, Bayesian Network has found successful applications in various areas such as medicine, document classification, information retrieval, semantic search, image processing, spam filter, system biology etc [12].
The major contributions of the proposed work are as follows: Identifying the presence of vulnerabilities for the given smart contract, especially concentrating on Reentrancy, Tx.origin, DOS vulnerabilities. In this research, Bayesian network(BN) model proposed for vulnerability detection, because of following reasons. BN are suitable for the problems which are having limited data sets. BN can predict the probability (severity) of each vulnerability, rather than predicting only YES/NO answers. BN can also suggest reasons for vulnerability.
The Association of the remaining paper follows the sequence, section2 gives literature work on smart contract vulnerability detection system using machine learning and Bayesian Network, section3 explains Proposed work, BNMC design for vulnerability detection, section4 demonstrates experimental setup, comparison table and outcomes, finally conclusion & future scope will be in section5.
In the literature, papers [2, 13–17] were published on behalf of smart contract susceptibility detection using different Machine Learning models. Liao JW et al. [2] proposed smart contract susceptibility detection using machine learning and fuzz testing techniques. The authors used traditional SC vulnerability static detection tools, which are Oyente, Mythril, SWC and Remix to label training data sets. The static detection tools require more time to detect vulnerability [13]. To prepare the dataset, the authors extracted features from SC opcodes, which are difficult to interpret. Wesley Joon et al. [14] proposed a paper towards safer smart contracts using a sequential learning approach called LSTM. Authors also considered SC opcode to extract features for data set preparation. Interpretation of opcode sequences is difficult for results analysis. Pouyan Momeni et al. [15] presented smart contract security analysis with machine learning techniques, and the authors used existing traditional tools, which are Mythril and Slither, to label the SCs that are present in the dataset. These SC are given to a solidity parser as an input, and it generates an AST (Abstract Syntax Tree). Processing of AST is straightforward and easy to interpret. Features can be extracted comfortably from this AST as its output. The traditional tools used in the paper have been taking more time to predict vulnerabilities.
Zhenguang Liu et al. [16] present an automated re-entry detection method for SC. This paper used BLSTM for the classification task. Authors propose using contract snippets (keywords) to capture semantic information from a given SC. The limitation of this paper is that it considers every word of each line in an SC to prepare a feature set. This may increase the number of features, but it may lead to a reduction in classification accuracy. Raju Bhaskar et al. [17] described the importance of Blockchain for secure and efficient energy management in real life applications. Zhang L et al. [18] described an SC vulnerability detection model based on an information graph and an ensemble technique. Input for this model is considering SC opcodes to find the critical opcode sequence for each vulnerability. Yiping Liu et al. [19] presented an SC vulnerability detection model based on symbolic excution by taking SC assembly code (opcode) as input to generate a control flow graph. Huang J et al. [20] presented an SC vulnerability detection model based on multi-tasking learning by considering SC byte code for a data set. But the proposed paper has been using high-level source code directly as input since SC source code is an easier source of vulnerability to trace than SC byte code or SC opcodes.
Peng Qian et al. [21] proposed graph neural networks for smart contracts vulnerability detection with help of expert knowledge. The authors constructed a graph for the extracted patterns from a given SC. Authors developed an open-source tool to extract patterns. Hence in the proposed work, the open source pattern extraction tool has been used and tailored as per the proposed work requirements. Feng Mi et al. [22] presented a paper on the automatic detection of SC vulnerabilities using deep learning. The authors prepared data set with extracted features from SC byte code. Moreover, the interpretation of byte code is difficult to analyze results. Other problems with deep learning techniques are, difficult to understand internal interpretation details like reasons for low prediction accuracy, determination of splitting ratio between training & test set to improve results, and prediction results have only two outcomes (Yes/No), however, getting the severity of each vulnerability is not possible.
Bayesian Networks are significant for prediction or classification problems if we have prior probabilities of required events. Eunjeong Park et al. [23] proposed a paper for predicting post-stroke outcomes with available risk factors probabilities using Bayesian Networks. Daniel Kottke et al. [24] proposed a Bayesian approach to deal with uncertainties to determine posterior probabilities with help of prior distribution. Benjamin Lucas et al. [25] presented Bayesian inspired, deep learning based method for producing land cover maps from time series data of satellite images. Zhao et al. [26] proposed a Bayesian networks to mine the knowledge and data information from web text and present in a way that users can easy to understand. Meng et al. [27] proposed Bayesian network to evaluate risk by identifying the relationship between supply chain and risk indicators. Chen et al. [28] proved probability technique called Bayesian networks are the good choice for complex engineering systems with limited data to prevent failures. Lakho et al. [29] used Bayesian network classifier for predicting performances of the students in blended learning model which was adopted for higher education.
Proposed system
The architecture of the proposed model consists of two parts which are BNMC design phase and BNMC validation phase as shown in Fig. 2. The first phase (BNMC design) consists of mainly five modules which are data set preparation, pattern extraction, preparation of pattern frequency table, construction of Bayesian Network model and filtering top important patterns for each vulnerability. The second phase (BNMC validation) of the architecture consists of total four modules which are pattern extraction for a given new SC, BN information Table preparation, finding severity of each vulnerability and providing final results. The BNMC design phase continued in this section. The detailed discussion about BNMC validation phase provided in section 4.

Architecture of BNMC for smart contract vulnerability detection.
In the BNMC design phase, initially a dataset of smart contracts and their vulnerabilities have been created by the usage of online resources [9, 30] as shown in the first module of Fig. 2. All smart contracts related to a particular vulnerability are maintained in a single document (DOCj).
Key patterns can be extracted from the smart contracts data set which was created in the previous step as shown in Fig. 3, it is showing examples of key patterns for a given smart contract. Different patterns lead to different vulnerability possibilities. Sometimes a sequence of patterns is also important to identify a particular vulnerability. To detect the re-entrancy vulnerability, important patterns are msg.sender.call.value() invocation (let it consider pattern P1), balance[msg.sender] = 0 (let it P2). The patterns order P1 followed by P2 leads to re-entrancy vulnerability, however, the sequence P2 followed by P1 doesn’t re-entrancy vulnerability. “Tx.origin” is the required pattern to detect “transaction origin” vulnerability. The required patterns to detect DOS vulnerability are if(function_call()),.gas(value), send(digits) etc as shown in Table 1. This Table 1 was prepared, after a careful analysis of the relationship between vulnerabilities and patterns as per the information from [11].

Pattern extraction from smart contract.
Vulnerability Vs patterns list
From this Table 1, some patterns that are common for more than one vulnerability are treated as independent patterns (Ex: P1, P4), for which we may give low priority. The vulnerability of a given SC can’t be determined by considering these common patterns alone, hence common patterns are treated as independent patterns(IP). Some patterns are unique for a particular vulnerability are treated as dependent patterns(DP), for which we have to give high priority (For example, P11, P12, P13, P14 are unique for V2). For the V3 vulnerability, P15 is unique.
From each vulnerability document (DOCj), the frequency of each pattern was recorded in the Pattern Frequency Table (PFT), which helps to calculate pattern probabilities and to prepare CPT values. These patterns were re-arranged based on the importance (Highest to lowest) of the patterns [11]. To construct Bayesian Network, top patterns were selected for each vulnerability from PFT.
Bayesian network construction
Bayesian networks are a probabilistic graphical model, that consists of nodes and directed edges between nodes. All variables/attributes are represented with nodes and conditional dependency between nodes are represented with directed edges. The missing connections between the nodes in the network indicates conditionally independent. BN models can be prepared by experts after careful analysis of data, then the constructed model can be used to predict the test events. BN models can be challenging to design, since lack of domain information completely to specify conditional dependence between variables. Even if available, it requires many calculations to specify full conditional probabilities for an event. Hence alternative solution is to specifying dependencies between variables as per available data and treating remaining all variables are conditionally independent. In the proposed BNMC design, all patterns (Ex: P1, P2,..) are considered as nodes/circles and sequence of edges between the nodes represents conditional dependencies between the patterns that are influencing for a particular vulnerability. All vulnerability types (Ex: V1, V2, V3) are represented as a leaf nodes in the network.
A Bayesian network as shown in Fig. 4, can be constructed after analyzing functional dependencies and sequences between patterns for each vulnerability. In Fig. 4, the patterns P1, P4, P6 are influencing more than one vulnerability and are considered independent patterns. The patterns P1 and P4 are influencing both V1 and V2 vulnerabilities with different probabilities; P1, and P6 are influencing V1 and V3 vulnerabilities. In the Bayesian Network, all independent patterns are placed in the first row. The sequence of patterns that influencing more for vulnerability are represented by arrows between the nodes in a network [11]. Each node in the Bayesian Network will maintain CPT (Conditional Probability Table) which gives the probability of each pattern that influences the vulnerability given by the presence or absence of parent patterns as shown in Table 3.1 & Table 3.2 A detailed discussion about the CPT explained in section4. The next section describes the experiment details and comparison results.

Bayesian Network Construction.
CPT preparation phase
A data set is prepared with three separate documents for three vulnerabilities which are Re-entrancy, DOS, and Tx.origin. All SCs of the same vulnerability could be maintained in the same document. Patterns were extracted (as shown in Table 1) from each vulnerability document to prepare the Pattern Frequency Table. For extracting patterns, string pattern concepts in python are essential. The frequency of each pattern is shown in Table 2.
Frequency Table for input data set
Frequency Table for input data set
CPT for Re-entrancy pattern P3
CPT for Re-entrancy pattern P5
These patterns were rearranged based on the importance (High to Low) of the patterns to detect particular vulnerability [11] and the top important patterns for each vulnerability were selected.
The Bayesian Network which is a directed acyclic graph was constructed for the important selected patterns by considering the relationship between patterns sequence and vulnerability as shown in Fig. 4. The network starts with independent patterns in the first level(Level1). Then based on the sequence between patterns that influence vulnerability, the further network will grow in a downwards direction. The last row in the network is the vulnerability deciding level, which consists of V1, V2, and V3. The patterns which are influencing more for vulnerability were placed in the last before level(Level3) in the network. Each node in the Bayesian Network maintains a Conditional Probability Table (CPT). For independent patterns, CPT consists of only a single entry that is the probability of that pattern itself because those are not dependent on any other patterns. For remaining patterns in the network, the number of entries in CPT for a given pattern is 2n, where n is the number of parent nodes on which the pattern depends on. The assignment of weight for the patterns can be calculated using the following equations.
Wil = Weight for the pattern-i in the level-L.
TL = Total number of levels in the BN
For all the nodes in level-1 which are Independent Patterns (IP), the weight assumed is 0.5 (low priority), since IP is less influencing the vulnerability. The probability of Independent patterns will get half when multiplied by 0.5. For the remaining patterns in the lower levels, weight increases as the level increases as per the Equation (1). To increase the vulnerability prediction accuracy, the weight has to be high for the dependent patterns, if its actual probability is low i.e weight of the dependent pattern is inversely proportions to its probability as shown in Equation (2).
Where Wid = Weight of dependent pattern-i
The Bayesian Network and CPT for the re-entrancy vulnerability are shown in Fig. 5, Tables 3.1 & 3.2 respectively. Probability(P3) depends on four parent patterns which are P1, P2, P4, and P6, hence the number of entries in CPT of P3 is 24 = 16. But in Table 3, only three entries were shown because of space restrictions. The second row in Table 3, is Prob(P3|P1 = T, P4 = F,P6 = T, P2 = F)=0.22. This value is calculated as 10/(100-(25 + 30)), where 100 is the total number of patterns in V1 smart contracts(DOC1); 10, 25 and 30 are the frequency count of patterns P3, P1 and P6 respectively. The remaining entries of CPT for other patterns can be calculated as done in the earlier step. Bayesian Network and CPTs for the DOS and tx.origin vulnerabilities are shown in Fig. 6, Tables 4.1, 4.2, Fig. 7 and Table 6 respectively.

BN for Re-entrancy Vulnerability.

BN for DOS Vulnerability.
CPT for DOS pattern P9
CPT for DOS pattern P11

BN for Tx.origin vulnerability.
To find the severity of each vulnerability for a given new smart contract, first, we have to extract patterns from it. For the extracted patterns, severity of each vulnerability can be calculated by using CPT values of the Bayesian Network and pattern weight. Bayesian Network information can be maintained the table using 2D Arrays to access efficiently for vulnerability prediction as shown in Table 5.
Bayesian network information for V1
Bayesian network information for V1
CPT for Tx.origin vulnerability
The Table 5 is prepared from Fig. 5. In Table 5, column 1 represents, patterns which are influencing V1 (Re-entrancy) vulnerability, second column is frequency of each pattern from V1 smart contract dataset(DOC1), third column represent whether pattern is dependent(1) or independent(0), fourth column is weight assigned to a pattern and last column is parent node list for dependent patterns. Bayesian Network Information tables for Fig. 6 and Fig. 7 also be created as Table 5 for DOS and Tx.origin vulnerabilities respectively. The algorithm for BNMC validation phase is shown in Fig. 8.
SPF = Sum of Parent Frequencies
TPCv = Total Patterns Count in DOCv
PP = Pattern Parent
DPL = Dependent Pattern List
PL = Pattern List
IPL = Independent Pattern List
Updated probability value after multiplying with weight can be calculated using Equation (3.1 & 3.2).

Algorithm for BNMC Validation.
theequation 4
Severity of each vulnerability can be calculated using the Equations 4&5.
In the Equation (5), DP = Dependent Pattern
IP = Independent Pattern
PFi = Pattern Frequency of Pi
TPj = Total Patterns in DOCj
SPFi = Sum of Parent Frequencies for patterni
Prob(VJ)=Probability(Vj) where j is from 1 to 3.
equation 6 Prob(PIJ)=Probability of pattern Pi in DOCj
Vulnerability Testing: For example, for a given new smart contract which is actually having DOS vulnerability, the extracted patterns are stored in PL(Pattern List). PL = {P1, P4, P6, P7, P9}.
To calculate the severity of V1 (reentrancy vulnerability), consider only the patterns P1, P4, and P6, because other patterns P7 and P9 are not in the V1-list i.e these two patterns are not influencing the V1. Probability of reentrancy vulnerability(V1) can be calculated using the Equation (4) as follows.
To calculate the severity of V2 (DOS vulnerability), directly use P9-CPT value for the chain of patterns P1–>P7–>P9 as shown in Bayesian Network. P6 is an independent event. P4 is not in the V2 list.
P(V2)updated=0.99 as per Equation (3.2)
P(P9|P7, P1) value can directly get from P9-CPT, which in turn is dependent on P7, which is dependent on P1. WP9 = 4/(4-3)=4, as per the Equation (1), where total number of levels(TL) in BN is 4 and pattern P9 is present at level(level-3).
Probability of tx.origin vulnerability(V3) can also be calculated using Equation (4) as follows, by considering only the patterns P1, P4 and P6, other two patterns P7, P9 are not in the V3 list.
By comparing the above probabilities, P(V2) is greater than the vulnerability deterministic threshold, hence the conclusion is, given smart contract is vulnerable of type DOS and there is no influence of other two vulnerabilities because probabilities of both V1 and V3 are less than the Vthreshold value as shown in the output in Table 7 and as per the Equation (6). Output of the validation phase is also describing the reasons to have the vulnerabilities and suggestion to avoid the vulnerabilities (as per Table 8) in addition to the severity of vulnerable probabilities, so that it is possible to correct the given smart contract to make sure vulnerable free before deploying into the Blockchain.
Output from testing phase
List of suggestions to avoid SC vulnerabilities [11]
Evaluation of the proposed BNMC design is done on the test data set (new smart contracts) to detect classification accuracy for each vulnerability. Quality of the proposed model is measured by considering classification metrics which are a confusion matrix, precision, recall, and accuracy. The Proposed model results are compared with traditional vulnerability detection methods such as Smartcheck [31], Mythril [32], Oyente [33], and LSTM model [8]. Initially, Bayesian learning(BL) [13] was applied to detect SC vulnerabilities without combining with Bayesian networks, after that tested with Bayesian Networks to improve detection accuracy results. In the beginning of the experiment, the proposed model results got less accuracy for DOS vulnerability as shown in Table 9 and Fig. 9, since dependent patterns of DOS vulnerabilities have less probability.
Accuracy comparison between proposed and traditional models

SC Vulnerability detection accuracy comparison with traditional methods.
Later BNMC model results improved by increasing the weight of dependent patterns for DOS vulnerability as per Equation (2). Equation(2) specifies that increase the weight of dependent patterns as much as its probability is low as shown in Table 10. Compared with the existing methods, the proposed BNMC design produced outperformed results to detect Reentrancy, DOS and Tx.origin vulnerabilities as shown in Table 11 and Fig. 10. The novelty of the proposed BNMC model can be observed from Table 12, as it specifies the causes or reasons for each vulnerability and makes suggestions to avoid vulnerabilities using Bayesian networks, in addition to detecting security vulnerabilities.
Updated weight to increase the accuracy of DOS vulnerability
SC vulnerabilities detection metrics for proposed and traditional Models

SC Vulnerability detection accuracy comparison with traditional methods.
Proposed BNMC model comparison with existing models
In this work, BNMC design was proposed and implemented to detect smart contract vulnerabilities. In Ethereum Blockchain all transactions get completed by following the rules defined in a smart contracts. Vulnerable smart contracts leads to loss of money for users by the attackers. Prior identification of vulnerabilities in smart contracts is essential task to avoid attacks. Proposed BNMC design was implemented, tested on new smart contracts and its results are showing improved vulnerability detection accuracy compared with traditional techniques since proposed model considers key patterns causes for vulnerabilities, pattern sequences, their probabilities and expert knowledge. Compared with other models, proposed model specifies reasons to have each vulnerability and suggestion to avoid vulnerabilities. The proposed model can able to detect only security vulnerabilities which are reentrancy, DOS and tx.origin. Detection of other smart contract vulnerabilities using Bayesian Networks and automation of the Bayesian Network construction is our future work.
