Bayesian network based vulnerability detection of blockchain smart contracts

Abstract

Ethereum is one of the popular Blockchain platform. The key component in the Ethereum Blockchain is the smart contract. Smart contracts (SC) are like normal computer programs which are written mostly in solidity high-level object-oriented programming language. Smart contracts allow completing transactions directly between two parties in the network without any middle man or mediator. Modification of the smart contracts are not possible once deployed into the Blockchain. Thus smart contract has to be vulnerable free before deploying into the Blockchain. In this paper, Bayesian Network Model was designed and constructed based on Bayesian learning concept to detect smart contract security vulnerabilities which are Reentrancy, Tx.origin and DOS. The results showed that the proposed BNMC (Bayesian Network Model Construction) design is able to detect the severity of each vulnerability and also suggest the reasons for the vulnerability. The accuracy of the proposed BNMC results are improved (accuracy 8% increased for both Reentracy and Tx.origin, 6% increased for DOS), compared with traditional method LSTM. This proposed BNMS design and implementation is the first attempt to detect smart contract vulnerabilities using Bayesian Networks.

Keywords

Blockchain smart contracts vulnerabilities Ethereum Bayesian network expert knowledge

1 Introduction

Smart Contracts(SC) are the programs of predefined rules which are deployed into the Blockchain and these programs execute automatically to determine that every transaction has to satisfy the predefined conditions to complete the transaction. In a Blockchain, transactions among two parties are recorded in an efficient, verifiable, and unchangeable [1 –3]. Blockchain can present an innovative solution for long-standing problems of security and data storage in centralized systems [4]. Smart Contracts worked based on conditional reasoning statements. Nowadays smart contracts have been used widely in the business among a group of untrusted persons, where every transaction can be completed according to rules agreed upon by all business stakeholders without the involvement of third-party verification [5]. Initially, SC basic codes are written in a high-level language, for example, a Solidity programming language by designers. Solidity compiler converts high-level source code into byte codes (EVM code) which is in the hexadecimal arrangement. These byte codes can be converted into EVM instructions and are called opcodes [6].

The reasons for attacks on smart contracts are classified into three categories: first, the smart contracts of Ethereum are mainly money-oriented transactions; second, deployed vulnerable smart contracts can’t be modifiable; and finally, smart contracts have no defined quality measures [7]. Many smart contract assaults in 2016 led to large money losses (multi-million dollar losses) as a result of vulnerabilities in SCs. In smart agreements at Ethereum, it’s currently worth focusing on innovative machine learning models to effectively detect SC vulnerabilities [8], especially in financial matters such as money transfers and more complicated code.

Smart contract vulnerabilities are divided into four categories 1. security, 2. functional, 3. developmental, 4. operational [9]. Security concerns include re-entrancy, external contract, DOS (Denial Of Service), the use of tx.origin, and external call unchecked. Functional vulnerabilities are locked money, integer division, integer underflow, integer overflow, unsafe interface type, and reliance on time stamps. Development concerns involve infringement of token API, private modifier, non-compiler version fixation, violation of the style guide, duplicated back function, and degree of implied visibility. Finally, operational problems include byte array and expensive loop vulnerabilities. The major focus of the proposed work is on security concerns that consist of re-entrancy, DOS, and tx.origin vulnerabilities.

Reentrancy Vulnerability: Reentrancy is a security vulnerability in which it can take the control flow and modify your data that is not expected by the call function [10]. Fallback functions in a solidity smart contract do not have a name, do not receive any arguments and do return nothing. A fallback function could be forced to execute automatically from an SC in two cases, 1) when it received Ether (money) by a smart contract, 2) when a contract called with an unmatched function call. Figure 1 shows a real-time example of two SCs named Hacker and Sufferer with reentrancy attack possibility. The money() from the Hacker contract calls the withdraw() from the Sufferer contract, which transfers the Ethers to the Hacker contract using the method msg.sender.call.value(). The fallback function from the Hacker contract is forced to execute as it received the money. A function fallback() is having a method withdraw() in its function body, by which money will withdraw again from the sufferer contract, which initiates the fallback() second time from a contract Hacker. Iteration of this process will continue till the balance of the sufferer account becomes zero Ethers. This is called a reentrancy attack. Reentrancy attack, in this example, can be avoided by interchanging the 5th and 6th lines in Sufferer Contract which is initializing the balance variable with zero before sending an amount to Hacker Contract. Now transaction of withdrawing() present in fallback() will fails since the balance is zero.

Fig. 1

Smart contracts having reentrancy vulnerability.

DOS vulnerability: A specific quantity of gas (transaction fee) is required for the execution of smart contracting functions. The Ethereum network establishes a gas limit for every block and should not exceeds that amount required for all transactions in a block. If the total gas required to execute all programmable statements in a smart contracts exceeds the gas limit, it leads to DOS vulnerability. Due to DOS vulnerability, the transactions are not allowed to complete. DOS vulnerability may be caused in scenarios [11] such as 1) Iteration of a loop (either for or while) 382 times, depends more or less on the network gas limit. 2) Work with an unfamiliar size of arrays. 3) specifying gas value directly in the SC, 4) usage of send() or transfer() instead of call.value(. . .).

Vulnerability of transaction origin [11]: The keyword “tx.origin” in solidity language indicates the address of the account that began a transaction. For example, consider in a sequence of call series X–>Y and Y–>Z, from the Z viewpoint, msg.sender is Y and tx.origin is X. The “tx.origin” keyword can sometimes lead to dubiety, instead, it can be handled with msg.sender().

Bayesian Networks are significant as a unified probabilistic framework for classification. Over the past few years, Bayesian Network has found successful applications in various areas such as medicine, document classification, information retrieval, semantic search, image processing, spam filter, system biology etc [12].

The major contributions of the proposed work are as follows:

Identifying the presence of vulnerabilities for the given smart contract, especially concentrating on Reentrancy, Tx.origin, DOS vulnerabilities.

In this research, Bayesian network(BN) model proposed for vulnerability detection, because of following reasons.

BN are suitable for the problems which are having limited data sets.

BN can predict the probability (severity) of each vulnerability, rather than predicting only YES/NO answers.

BN can also suggest reasons for vulnerability.

The Association of the remaining paper follows the sequence, section2 gives literature work on smart contract vulnerability detection system using machine learning and Bayesian Network, section3 explains Proposed work, BNMC design for vulnerability detection, section4 demonstrates experimental setup, comparison table and outcomes, finally conclusion & future scope will be in section5.

2 Related work

In the literature, papers [2 , 13–17] were published on behalf of smart contract susceptibility detection using different Machine Learning models. Liao JW et al. [2] proposed smart contract susceptibility detection using machine learning and fuzz testing techniques. The authors used traditional SC vulnerability static detection tools, which are Oyente, Mythril, SWC and Remix to label training data sets. The static detection tools require more time to detect vulnerability [13]. To prepare the dataset, the authors extracted features from SC opcodes, which are difficult to interpret. Wesley Joon et al. [14] proposed a paper towards safer smart contracts using a sequential learning approach called LSTM. Authors also considered SC opcode to extract features for data set preparation. Interpretation of opcode sequences is difficult for results analysis. Pouyan Momeni et al. [15] presented smart contract security analysis with machine learning techniques, and the authors used existing traditional tools, which are Mythril and Slither, to label the SCs that are present in the dataset. These SC are given to a solidity parser as an input, and it generates an AST (Abstract Syntax Tree). Processing of AST is straightforward and easy to interpret. Features can be extracted comfortably from this AST as its output. The traditional tools used in the paper have been taking more time to predict vulnerabilities.

Zhenguang Liu et al. [16] present an automated re-entry detection method for SC. This paper used BLSTM for the classification task. Authors propose using contract snippets (keywords) to capture semantic information from a given SC. The limitation of this paper is that it considers every word of each line in an SC to prepare a feature set. This may increase the number of features, but it may lead to a reduction in classification accuracy. Raju Bhaskar et al. [17] described the importance of Blockchain for secure and efficient energy management in real life applications. Zhang L et al. [18] described an SC vulnerability detection model based on an information graph and an ensemble technique. Input for this model is considering SC opcodes to find the critical opcode sequence for each vulnerability. Yiping Liu et al. [19] presented an SC vulnerability detection model based on symbolic excution by taking SC assembly code (opcode) as input to generate a control flow graph. Huang J et al. [20] presented an SC vulnerability detection model based on multi-tasking learning by considering SC byte code for a data set. But the proposed paper has been using high-level source code directly as input since SC source code is an easier source of vulnerability to trace than SC byte code or SC opcodes.

Peng Qian et al. [21] proposed graph neural networks for smart contracts vulnerability detection with help of expert knowledge. The authors constructed a graph for the extracted patterns from a given SC. Authors developed an open-source tool to extract patterns. Hence in the proposed work, the open source pattern extraction tool has been used and tailored as per the proposed work requirements. Feng Mi et al. [22] presented a paper on the automatic detection of SC vulnerabilities using deep learning. The authors prepared data set with extracted features from SC byte code. Moreover, the interpretation of byte code is difficult to analyze results. Other problems with deep learning techniques are, difficult to understand internal interpretation details like reasons for low prediction accuracy, determination of splitting ratio between training & test set to improve results, and prediction results have only two outcomes (Yes/No), however, getting the severity of each vulnerability is not possible.

Bayesian Networks are significant for prediction or classification problems if we have prior probabilities of required events. Eunjeong Park et al. [23] proposed a paper for predicting post-stroke outcomes with available risk factors probabilities using Bayesian Networks. Daniel Kottke et al. [24] proposed a Bayesian approach to deal with uncertainties to determine posterior probabilities with help of prior distribution. Benjamin Lucas et al. [25] presented Bayesian inspired, deep learning based method for producing land cover maps from time series data of satellite images. Zhao et al. [26] proposed a Bayesian networks to mine the knowledge and data information from web text and present in a way that users can easy to understand. Meng et al. [27] proposed Bayesian network to evaluate risk by identifying the relationship between supply chain and risk indicators. Chen et al. [28] proved probability technique called Bayesian networks are the good choice for complex engineering systems with limited data to prevent failures. Lakho et al. [29] used Bayesian network classifier for predicting performances of the students in blended learning model which was adopted for higher education.

Gap in the Literature: In the literature authors used static detection tools (Oyente, Mythril), machine / deep learning(ML/DL) techniques (LSTM) to detect vulnerabilities using the smart contract dataset in the form of byte code or opcode. Limitations of these are, static detection tools require more time to predict, ML/DL techniques can predict vulnerability but difficult to interpret internal details to know reason for poor results. The dataset in the form of byte code or opcode is also difficult to interpret compared with source code.

Reason to choose proposed methodology: Bayesian Networks are preferable for limited dataset applications [17]. Bayesinan networks describe the severity (probability) of each outcome, and they can also describe the influence of each outcome. After analyzing the limitations of traditional techniques, and the benefits of Bayesian Networks, the proposed work focused to use Bayesian Networks to detect smart contract vulnerabilities using the prior knowledge of patterns and their impact on vulnerabilities. Best of our knowledge, usage of the proposed BNMC design is the first attempt to detect smart contract vulnerabilities. The next section described the architecture of the BNMC for SC vulnerability detection.

3 Proposed system

The architecture of the proposed model consists of two parts which are BNMC design phase and BNMC validation phase as shown in Fig. 2. The first phase (BNMC design) consists of mainly five modules which are data set preparation, pattern extraction, preparation of pattern frequency table, construction of Bayesian Network model and filtering top important patterns for each vulnerability. The second phase (BNMC validation) of the architecture consists of total four modules which are pattern extraction for a given new SC, BN information Table preparation, finding severity of each vulnerability and providing final results. The BNMC design phase continued in this section. The detailed discussion about BNMC validation phase provided in section 4.

Fig. 2

Architecture of BNMC for smart contract vulnerability detection.

In the BNMC design phase, initially a dataset of smart contracts and their vulnerabilities have been created by the usage of online resources [9 , 30] as shown in the first module of Fig. 2. All smart contracts related to a particular vulnerability are maintained in a single document (DOC_j).

3.1 Pattern extraction

Key patterns can be extracted from the smart contracts data set which was created in the previous step as shown in Fig. 3, it is showing examples of key patterns for a given smart contract. Different patterns lead to different vulnerability possibilities. Sometimes a sequence of patterns is also important to identify a particular vulnerability. To detect the re-entrancy vulnerability, important patterns are msg.sender.call.value() invocation (let it consider pattern P1), balance[msg.sender] = 0 (let it P2). The patterns order P1 followed by P2 leads to re-entrancy vulnerability, however, the sequence P2 followed by P1 doesn’t re-entrancy vulnerability. “Tx.origin” is the required pattern to detect “transaction origin” vulnerability. The required patterns to detect DOS vulnerability are if(function_call()),.gas(value), send(digits) etc as shown in Table 1. This Table 1 was prepared, after a careful analysis of the relationship between vulnerabilities and patterns as per the information from [11].

Fig. 3

Pattern extraction from smart contract.

Table 1

Vulnerability Vs patterns list

Vulnerability type (V_j)	Patterns	Pattern number (P_i)
Reentrancy	function fname()	P1
	function() payable	P2
	msg.sender.call.value()	P3
	call.value	P4
	balance[msg.sender] = 0	P5
	balance/amount/value	P6
DOS	for(condition)	P7
	while(condition)	P8
	require(.send())	P9
	array = new datatype[](0);	P10
	.gas(value)	P11
	gas:value	P12
	.send(digits);	P13
	.transfer(digits);	P14
Tx.Origin	tx.origin	P15

From this Table 1, some patterns that are common for more than one vulnerability are treated as independent patterns (Ex: P1, P4), for which we may give low priority. The vulnerability of a given SC can’t be determined by considering these common patterns alone, hence common patterns are treated as independent patterns(IP). Some patterns are unique for a particular vulnerability are treated as dependent patterns(DP), for which we have to give high priority (For example, P11, P12, P13, P14 are unique for V2). For the V3 vulnerability, P15 is unique.

3.2 Preparation of Pattern Frequency Table

From each vulnerability document (DOCj), the frequency of each pattern was recorded in the Pattern Frequency Table (PFT), which helps to calculate pattern probabilities and to prepare CPT values. These patterns were re-arranged based on the importance (Highest to lowest) of the patterns [11]. To construct Bayesian Network, top patterns were selected for each vulnerability from PFT.

3.3 Bayesian network construction

Bayesian networks are a probabilistic graphical model, that consists of nodes and directed edges between nodes. All variables/attributes are represented with nodes and conditional dependency between nodes are represented with directed edges. The missing connections between the nodes in the network indicates conditionally independent. BN models can be prepared by experts after careful analysis of data, then the constructed model can be used to predict the test events. BN models can be challenging to design, since lack of domain information completely to specify conditional dependence between variables. Even if available, it requires many calculations to specify full conditional probabilities for an event. Hence alternative solution is to specifying dependencies between variables as per available data and treating remaining all variables are conditionally independent. In the proposed BNMC design, all patterns (Ex: P1, P2,..) are considered as nodes/circles and sequence of edges between the nodes represents conditional dependencies between the patterns that are influencing for a particular vulnerability. All vulnerability types (Ex: V1, V2, V3) are represented as a leaf nodes in the network.

A Bayesian network as shown in Fig. 4, can be constructed after analyzing functional dependencies and sequences between patterns for each vulnerability. In Fig. 4, the patterns P1, P4, P6 are influencing more than one vulnerability and are considered independent patterns. The patterns P1 and P4 are influencing both V1 and V2 vulnerabilities with different probabilities; P1, and P6 are influencing V1 and V3 vulnerabilities. In the Bayesian Network, all independent patterns are placed in the first row. The sequence of patterns that influencing more for vulnerability are represented by arrows between the nodes in a network [11]. Each node in the Bayesian Network will maintain CPT (Conditional Probability Table) which gives the probability of each pattern that influences the vulnerability given by the presence or absence of parent patterns as shown in Table 3.1 & Table 3.2 A detailed discussion about the CPT explained in section4. The next section describes the experiment details and comparison results.

Fig. 4

Bayesian Network Construction.

4 Experimental setup and comparison results

4.1 CPT preparation phase

A data set is prepared with three separate documents for three vulnerabilities which are Re-entrancy, DOS, and Tx.origin. All SCs of the same vulnerability could be maintained in the same document. Patterns were extracted (as shown in Table 1) from each vulnerability document to prepare the Pattern Frequency Table. For extracting patterns, string pattern concepts in python are essential. The frequency of each pattern is shown in Table 2.

Table 2
Frequency Table for input data set

Table 3.1

CPT for Re-entrancy pattern P3

P1	P4	P6	P2	P(P3)
T	F	T	F	0.22
T	T	T	T	0.66
F	F	F	F	0.1
. . .	. . .	. . .	. . .	. . .

Table 3.2

CPT for Re-entrancy pattern P5

P1	P4	P6	P2	P3	P(P5)
T	T	T	T	T	0.1
T	T	T	T	F	0.33
F	F	F	F	T	0.05
. . .	. . .	. . .	. . .	. . .	. . .

These patterns were rearranged based on the importance (High to Low) of the patterns to detect particular vulnerability [11] and the top important patterns for each vulnerability were selected.

The Bayesian Network which is a directed acyclic graph was constructed for the important selected patterns by considering the relationship between patterns sequence and vulnerability as shown in Fig. 4. The network starts with independent patterns in the first level(Level1). Then based on the sequence between patterns that influence vulnerability, the further network will grow in a downwards direction. The last row in the network is the vulnerability deciding level, which consists of V1, V2, and V3. The patterns which are influencing more for vulnerability were placed in the last before level(Level3) in the network. Each node in the Bayesian Network maintains a Conditional Probability Table (CPT). For independent patterns, CPT consists of only a single entry that is the probability of that pattern itself because those are not dependent on any other patterns. For remaining patterns in the network, the number of entries in CPT for a given pattern is 2ⁿ, where n is the number of parent nodes on which the pattern depends on. The assignment of weight for the patterns can be calculated using the following equations. $W_{il} = {\begin{matrix} \begin{matrix} 0.5 & if & L = 1 \end{matrix} \\ \begin{matrix} \frac{TL}{(TL - L)} & if & 1 < L < TL \end{matrix} \end{matrix}}$ (1)

W_il = Weight for the pattern-i in the level-L.

TL = Total number of levels in the BN

For all the nodes in level-1 which are Independent Patterns (IP), the weight assumed is 0.5 (low priority), since IP is less influencing the vulnerability. The probability of Independent patterns will get half when multiplied by 0.5. For the remaining patterns in the lower levels, weight increases as the level increases as per the Equation (1). To increase the vulnerability prediction accuracy, the weight has to be high for the dependent patterns, if its actual probability is low i.e weight of the dependent pattern is inversely proportions to its probability as shown in Equation (2). $W_{id} = \frac{1}{probability (Pi)}$ (2)

Where W_id = Weight of dependent pattern-i

The Bayesian Network and CPT for the re-entrancy vulnerability are shown in Fig. 5, Tables 3.1 & 3.2 respectively. Probability(P3) depends on four parent patterns which are P1, P2, P4, and P6, hence the number of entries in CPT of P3 is 2⁴ = 16. But in Table 3, only three entries were shown because of space restrictions. The second row in Table 3, is Prob(P3|P1 = T, P4 = F,P6 = T, P2 = F)=0.22. This value is calculated as 10/(100-(25 + 30)), where 100 is the total number of patterns in V1 smart contracts(DOC₁); 10, 25 and 30 are the frequency count of patterns P3, P1 and P6 respectively. The remaining entries of CPT for other patterns can be calculated as done in the earlier step. Bayesian Network and CPTs for the DOS and tx.origin vulnerabilities are shown in Fig. 6, Tables 4.1, 4.2, Fig. 7 and Table 6 respectively.

Fig. 5

BN for Re-entrancy Vulnerability.

Fig. 6

BN for DOS Vulnerability.

Table 4.1

CPT for DOS pattern P9

P1	P7	P8	P(P9)
T	T	T	0.307
T	T	F	0.25
F	F	T	0.13
. . .	. . .	. . .	. . .

Table 4.2

CPT for DOS pattern P11

P6	P(P11)
T	0.04
F	0.04

Fig. 7

BN for Tx.origin vulnerability.

4.2 BNMC validation phase

To find the severity of each vulnerability for a given new smart contract, first, we have to extract patterns from it. For the extracted patterns, severity of each vulnerability can be calculated by using CPT values of the Bayesian Network and pattern weight. Bayesian Network information can be maintained the table using 2D Arrays to access efficiently for vulnerability prediction as shown in Table 5.

Table 5
Bayesian network information for V1

Freq D/I W Parent List

P1 20 0 0.5 []

P2 5 1 5 [P1,P3]

P3 10 1 2 [P5]

. . . . . . . . . . . . . . .

Freq	D/I	W	Parent List
P1	20	0	0.5	[]
P2	5	1	5	[P1,P3]
P3	10	1	2	[P5]
. . .	. . .	. . .	. . .	. . .

Table 6

CPT for Tx.origin vulnerability

P1	P4	P6	P(P15)
T	T	T	0.1
T	F	F	0.45
F	T	T	0.35
. . .	. . .	. . .	. . .

The Table 5 is prepared from Fig. 5. In Table 5, column 1 represents, patterns which are influencing V1 (Re-entrancy) vulnerability, second column is frequency of each pattern from V1 smart contract dataset(DOC₁), third column represent whether pattern is dependent(1) or independent(0), fourth column is weight assigned to a pattern and last column is parent node list for dependent patterns. Bayesian Network Information tables for Fig. 6 and Fig. 7 also be created as Table 5 for DOS and Tx.origin vulnerabilities respectively. The algorithm for BNMC validation phase is shown in Fig. 8.

SPF = Sum of Parent Frequencies

TPC_v = Total Patterns Count in DOC_v

PP = Pattern Parent

DPL = Dependent Pattern List

PL = Pattern List

IPL = Independent Pattern List

equation 3.1

Updated probability value after multiplying with weight can be calculated using Equation (3.1 & 3.2).

Fig.8

Algorithm for BNMC Validation.

$x = prob (P) * W_{p}$ (3.1) $\begin{matrix} {prob}_{updated} & = & {\begin{matrix} \begin{matrix} 0.99 & if & x ⩾ 1 \end{matrix} \\ \begin{matrix} x & otherwise \end{matrix} \end{matrix}} \end{matrix}$ (3.2)

theequation 4

Severity of each vulnerability can be calculated using the Equations 4&5. $prob (V_{j}) = \sum_{P_{i} \in PL} prob (P_{i}) * W_{i}$ (4)

$prob (P_{i}) = {\begin{matrix} \begin{matrix} {PF}_{i} / ({TP}_{j} - {SPF}_{i}) & if & P_{i} & is & DP \end{matrix} \\ \begin{matrix} {PF}_{i} / {TP}_{j} & if & P_{i} & is & IP \end{matrix} \end{matrix}}$ (5)

In the Equation (5), DP = Dependent Pattern

IP = Independent Pattern

PF_i = Pattern Frequency of P_i

TP_j = Total Patterns in DOC_j

SPF_i = Sum of Parent Frequencies for pattern_i

Prob(V_J)=Probability(V_j) where j is from 1 to 3.

equation 6 Prob(P_IJ)=Probability of pattern P_i in DOC_j

$\begin{matrix} V_{threshold} \\ = {\begin{matrix} \begin{matrix} prob (V_{j}) ⩾ 0.5 & vulnerability & exit \end{matrix} \\ \begin{matrix} else & Not & vulnerable \end{matrix} \end{matrix}} \end{matrix}$ (6)

Vulnerability Testing: For example, for a given new smart contract which is actually having DOS vulnerability, the extracted patterns are stored in PL(Pattern List). PL = {P1, P4, P6, P7, P9}.

To calculate the severity of V1 (reentrancy vulnerability), consider only the patterns P1, P4, and P6, because other patterns P7 and P9 are not in the V1-list i.e these two patterns are not influencing the V1. Probability of reentrancy vulnerability(V1) can be calculated using the Equation (4) as follows. $\begin{matrix} {P (V 1) = (P (P 1) * W}_{P 1} {) + (P (P 4) * W}_{P 4} {) + (P (P 6) * W}_{P 6}) \\ = (0 . 25 * 0 . 5) + (0 . 2 * 0 . 5) + (0 . 3 * 0 . 5) = 0 . 37 \end{matrix}$

To calculate the severity of V2 (DOS vulnerability), directly use P9-CPT value for the chain of patterns P1–>P7–>P9 as shown in Bayesian Network. P6 is an independent event. P4 is not in the V2 list. $\begin{matrix} {P (V 2) = (P (P 9 | P 7, P 1) * W}_{P 9} {) + (P (P 6) * W}_{P 6}) \\ = (0 . 25 * 4) + (0 . 08 * 0 . 5) = 1 . 04 > 1 \end{matrix}$

P(V2)_updated=0.99 as per Equation (3.2)

P(P9|P7, P1) value can directly get from P9-CPT, which in turn is dependent on P7, which is dependent on P1. W_P9 = 4/(4-3)=4, as per the Equation (1), where total number of levels(TL) in BN is 4 and pattern P9 is present at level(level-3).

Probability of tx.origin vulnerability(V3) can also be calculated using Equation (4) as follows, by considering only the patterns P1, P4 and P6, other two patterns P7, P9 are not in the V3 list. $\begin{matrix} {P (V 3) = (P (P 1) * W}_{P 1} {) + (P (P 4) * W}_{P 4} {) + (P (P 6) * W}_{P 6}) \\ = (0 . 45 * 0 . 5) + (0 . 15 * 0 . 5) + (0 . 15 * 0 . 5) = 0 . 36 \end{matrix}$

By comparing the above probabilities, P(V2) is greater than the vulnerability deterministic threshold, hence the conclusion is, given smart contract is vulnerable of type DOS and there is no influence of other two vulnerabilities because probabilities of both V1 and V3 are less than the V_threshold value as shown in the output in Table 7 and as per the Equation (6). Output of the validation phase is also describing the reasons to have the vulnerabilities and suggestion to avoid the vulnerabilities (as per Table 8) in addition to the severity of vulnerable probabilities, so that it is possible to correct the given smart contract to make sure vulnerable free before deploying into the Blockchain.

Table 7

Output from testing phase

Vul _Type	Probability	Patterns found	Suggestions to avoid vulnerability
V1	0.37	P1,P4,P6 are Independent Patterns	No Vulnerability
V2	0.99	P1,P7,P9,P6	P(9)=require(.send())P7,P9 sequence leads to DOS Vulnerability.require() operation performs revert operation if send() fails
V3	0.36	P1,P4,P6 are Independent Patterns	No Vulnerability

Table 8

List of suggestions to avoid SC vulnerabilities [11]

Dependent patterns (DP)	Vulnerability suggestions to avoid
P3P5	P3-msg.sender.call.value()P5- balance[msg.sender] = 0P3, P5 Sequence leads to V1 vulnerability
P8P7	Iteration of for/while loop more than 382 times,leads to V2 vulnerability
P9	•P7,P9 sequence leads to DOS vulnerability
	•require() performs revert operation, if send() fails
P15	P15 = tx.origin() leads to V3 vulnerabilityInstead use msg.sender().
P11P12	Don’t specify gas value directly.
P13P14	Transfer(), send() forward a fixed amount of 2300 gas, but it may change frequently. It leads to DOS vulnerability. Instead use.call.value(. . .)(“ ”)

Evaluation of the proposed BNMC design is done on the test data set (new smart contracts) to detect classification accuracy for each vulnerability. Quality of the proposed model is measured by considering classification metrics which are a confusion matrix, precision, recall, and accuracy. The Proposed model results are compared with traditional vulnerability detection methods such as Smartcheck [31], Mythril [32], Oyente [33], and LSTM model [8]. Initially, Bayesian learning(BL) [13] was applied to detect SC vulnerabilities without combining with Bayesian networks, after that tested with Bayesian Networks to improve detection accuracy results. In the beginning of the experiment, the proposed model results got less accuracy for DOS vulnerability as shown in Table 9 and Fig. 9, since dependent patterns of DOS vulnerabilities have less probability.

Table 9

Accuracy comparison between proposed and traditional models

Accuracy	Reentrancy	Tx.Origin	DOS
Smart check	52.58	45.03	48.28
Mythril	68.06	69.17	68.06
Oyente	75.72	75.56	79.72
LSTM	80.56	70.28	82.56
BNMC	88.61	78.61	65.89

Fig. 9

SC Vulnerability detection accuracy comparison with traditional methods.

Later BNMC model results improved by increasing the weight of dependent patterns for DOS vulnerability as per Equation (2). Equation(2) specifies that increase the weight of dependent patterns as much as its probability is low as shown in Table 10. Compared with the existing methods, the proposed BNMC design produced outperformed results to detect Reentrancy, DOS and Tx.origin vulnerabilities as shown in Table 11 and Fig. 10. The novelty of the proposed BNMC model can be observed from Table 12, as it specifies the causes or reasons for each vulnerability and makes suggestions to avoid vulnerabilities using Bayesian networks, in addition to detecting security vulnerabilities.

Table 10

Updated weight to increase the accuracy of DOS vulnerability

Vulnerability	Dependent pattern	Probability	Initial weight	Modified weight
DOS	P12	0.04	4	10
	P13	0.03	4	12
	P14	0.03	4	12

Table 11

SC vulnerabilities detection metrics for proposed and traditional Models

	Reentrancy			Tx.Origin			DOS
	Precision	Recall	Accuracy	Precision	Recall	Accuracy	Precision	Recall	Accuracy
	(%)	(%)	(%)	(%)	(%)	(%)	(%)	(%)	(%)
Smart check	25.02	32.06	52.58	38.02	36.78	45.03	40.31	25.78	48.28
Mythril	68.16	67.78	68.06	68.85	70	69.17	68.16	67.78	68.06
Oyente	72.88	81.24	75.72	76.14	74.44	75.56	76.88	85	79.72
LSTM	81.04	83.78	80.56	69.31	72.78	70.28	81.04	84.78	82.56
BNMC	89.27	87.78	88.61	76.96	81.67	78.61	89.27	87.78	88.61

Fig. 10

SC Vulnerability detection accuracy comparison with traditional methods.

Table 12

Proposed BNMC model comparison with existing models

	Vulnerabilities detection	Source of SC consider for data set	Machine learning model used	Vulnerability causes and suggestions as output
Liao JW et al. [2]	Reentrancy, Timestamp, Overflow, Underflow, Callstack, TOD	SC Opcodes	Logistic Regression	NO
Wesley Joon et al. [14]	To detect new attack trends.	SC Opcodes	LSTM	NO
Wang Wei et al. [6]	Reentrancy, Timestamp, Overflow, Underflow, Callstack, TOD	SC Opcodes	XGBoost	NO
Feng Mi et al. [22]	Most frequent Vulnerabilities.	SC Byte Code	Metric Learning based DNN	NO
Peng Qian et al. [21]	Reentrancy	High Level SC Code	BLSTM-ATT	NO
Zhenguang Liu et al. [16]	Reentrancy, Timestamp, Infinite Loop	High Level SC Code	Graph Neural Networks	NO
Proposed BNMC Model	Re-entrancy, DOS, Origin of Transaction	High Level SC Code	Bayesian Networks	YES

5 Conclusion & future work

In this work, BNMC design was proposed and implemented to detect smart contract vulnerabilities. In Ethereum Blockchain all transactions get completed by following the rules defined in a smart contracts. Vulnerable smart contracts leads to loss of money for users by the attackers. Prior identification of vulnerabilities in smart contracts is essential task to avoid attacks. Proposed BNMC design was implemented, tested on new smart contracts and its results are showing improved vulnerability detection accuracy compared with traditional techniques since proposed model considers key patterns causes for vulnerabilities, pattern sequences, their probabilities and expert knowledge. Compared with other models, proposed model specifies reasons to have each vulnerability and suggestion to avoid vulnerabilities. The proposed model can able to detect only security vulnerabilities which are reentrancy, DOS and tx.origin. Detection of other smart contract vulnerabilities using Bayesian Networks and automation of the Bayesian Network construction is our future work.

References

Sharma

, Jindal

and Borah

M.D.

, A Review of Blockchain-Based Applications and Challenges, Pers Commun 123 (2022), 1201–1243. https://doi.org/10.1007/s11277-021-09176-7.

Liao

J.W.

, Tsai

T.T.

, He

C.K.

, Tien

C.W.

SoliAudit: Smart Contract Vulnerability Assessment Based on Machine Learning and Fuzz Testing, 2019 Sixth International Conference on Internet of Things: Systems, Management and Security (IOTSMS), 2019, pp. 458–465.

Joshi

AP.

, Han

and Wang

, A survey on security and privacy issues of Blockchain technology, Mathematical Foundations of Computing 1(2) (2018), 121–147.

Patil

, Sangeetha

and Bhaskar

, Blockchain for IoT Access Control, Security and Privacy: A Review, Mathematical Foundations of Computing 117 (2021), 1815–1834. https://doi.org/10.1007/s11277-020-07947-2.

Safder Iqra , Saeed-Ul Hassan , Deep Learning-based Extraction of Algorithmic Metadata in Full-Text Scholarly Documents, In Information Processing and Management, 2020.

Wang Wei , Jingjing Song , Guangquan Xu , Yidong Li , Wang Hao , Chunhua Su , ContractWard: Automated Vulnerability Detection Models for Ethereum Smart Contracts, In IEEE Transactions on Network Science and Engineering, 2019.

Hao Wang , Dit-Yan Yeung , A Survey on Bayesian Deep Learning, In5, Article 108, 37 pages, ACM Comput Surv 53, 2020.

Yugesh Verma , LSTM use in text classification, (2021), Accessed on: Sept 7, 2022. [Online] Available: https://analyticsindiamag.com/a-complete-guide-to-lstmarchitecture-and-its-use-in-text-classification/.

Ethereum Smart Contract Best Practices, (2022), Accessed on: Sept 7, 2022. [Online] Available: https://consensys.github.io/smart-contract-bestpractices/knownattacks/.

10.

LakshmiNarayana

, Sathiyamurthy

Automation and smart materials in detecting smart contracts vulnerabilities in Blockchain using deep learning, In Materials Today: Proceedings, 2021. https://doi.org/10.1016/j.matpr.2021.04.125.

11.

Smart Contract Dataset, (2020), Accessed on: Sept 7, 2022. [Online] Available: https://swcregistry.io/docs/SWC-107#modifier-reentrancy-fixedsol.

12.

Codetta-Raiteri

, Editorial for the Special Issue on Bayesian Networks: Inference Algorithms, Applications, and Software Tools, Algorithms 14 (2021), 138. https://doi.org/10.3390/a14050138.

13.

Tom Mitchell

, In Machine Learning, McGraw Hill, Chapter-6, pp. 177–184, 1997.

14.

Wesley Joon , Wie Tann , Xing Jie Han , Sourav Sen Gupta , Ong Yew-Soon , Towards Safer Smart Contracts: A Sequence Learning Approach to Detecting Security Threats, In Proceedings of ACM (Conference 19), ACM, New York, NY, USA, 2019.

15.

Pouyan Momeni , Wang Yu , Samavi Reza , Machine Learning Model for Smart Contracts Security Analysis, In IEEE conference, 2019.

16.

Liu Zhenguang , He Qingming , Towards Automated Reentrancy Detection for Smart Contracts Based on Sequential Models, In, IEEE Access 8 (2020), 19685–19695.

17.

Raju Bhaskar

K.B.

Prasanth Aruchamy , Paramasivan Saranya , An energy-efficient blockchain approach for secure communication in IoT-enabled electric vehicles, International Journal of Communication System 35(Issue 11) (2022).

18.

Zhang

, Wang

and Wang

, A Novel Smart Contract Vulnerability Detection Method Based on Information Graph and Ensemble Learning, Sensors 22 (2022), 3581.

19.

Liu Yiping , Xu Jie , Cui Baojiang , Contract Vulnerability Detection Based on Symbolic Execution Technology, CNCERT 2021, CCIS 1506, pp. 193–207, 2022.8.10.

20.

Huang

, Zhou

, Xiong

and Li

, Smart Contract Vulnerability Detection Model Based on Multi-Task Learning, Sensors 22 (2022), 1829.

21.

Peng Qian , Wang Xun , Combining Graph Neural Networks with Expert Knowledge for Smart Contract Vulnerability Detection, In IEEE transactions on knowledge and data engineering, 2021.

22.

Feng

Mi.

Wang Zhuoyi , VSCL: Automating Vulnerability Detection in Smart Contracts with Deep Learning, In IEEE International Conference on Blockchain and Cryptocurrency (ICBC), 2021.

23.

Eunjeong Park , A Bayesian Network Model for Predicting Post-stroke Outcomes With Available Risk Factors, In volume, Frontiers in Neurology 9, 2018.

24.

Daniel Kottke , , Marek Herde: “Toward optimal probabilistic active learning using a Bayesian approach”, In Machine Learning 110 (2021), 1199–1231.

25.

Lucas , Charlotte Pelletier, , A Bayesianinspired, deep earning-based, semi-supervised domain adaptation technique for land cover mapping, In Machine Learning, 2020. https://doi.org/10.1007/s10994-020-05942-z.

26.

Zhao , Wei , Luo, Zeju , Web Text Data Mining Method Based on Bayesian Network with Fuzzy Algorithms,, Journal of Intelligent&Fuzzy Systems 38(4) (2020), 3727–3735.

27.

Meng , Mengjun , Qiuyun , Wang , Yingming , The Risk Assessment of Manufacturing Supply Chains Based on Bayesian Networks with Uncertainty of Demand, Journal of Intelligent&Fuzzy Systems 42(6) (2022), 5753–5771.

28.

Chen , Yong , et al., A Bayesian Network Structural Learning Algorithm for Calculating the Failure Probabilities of Complex Engineering Systems with Limited Data, Journal of Intelligent&Fuzzy Systems 42(3) (2022), 1991–2004.

29.

Lakho , Shamshad , et al., Development of an Integrated Blended Learning Model and Its Performance Prediction on Students’ Learning Using Bayesian Network,&, Fuzzy Systems 43(2) (2022), 2015–2023.

30.

Smart Contract Dataset, (2022), Accessed on: Sept 7, 2022. [Online] Available: https://github.com/smartbugs/smartbugs.

31.

Tikhomirov

, Voskresenskaya

Smartcheck: Static analysis of ethereum smart contracts, In WETSEB, 2018, pp. 9–16.

32.

Mueller

, (2017). ConsenSys/Mythril. Accessed on: May 7, 2022; [Online] Available: http://github.com/ConsenSys/mythril.

33.

Meon.fund, Oyente, (2020), Accessed on: May 7, 2022; [Online].Available: https://github.com/melonproject/oyente.