Application of blockchain-driven big data analytics in financial fraud detection in engineering management

Abstract

This work proposes a blockchain-based computing model with differential privacy for fraud detection in mobile edge computing (MEC) environments. The model enables edge nodes to collaborate securely, ensuring accurate and trustworthy fraud diagnosis. A top-down structure is recommended for financial records, adaptively partitioned for efficiency. The differential privacy mechanism employs randomized responses, while the blockchain-secured computational model (BSCM) safeguards customer and transaction privacy. Traditional fraud detection may delay detection and response due to real-time detection restrictions. The proposed system identifies fraud in real time, saving financial losses. The blockchain paradigm provides transparent, tamper-proof transaction records and secure data storage, unlike current systems. The proposed solution employs oversampling and undersampling to handle imbalanced datasets with more fraudulent transactions than lawful ones. Theoretical analysis shows real-time fraud detection is achievable with minimal error rates while preserving privacy. Compared to client-server models, BSCM reduces data write execution time by 2.6x and improves data retrieval efficiency by 20x. Experiments also evaluate the impact of larger financial datasets. The financial record selection and information extraction models achieved 96.5% and 95.2% accuracy, respectively, with F1-scores of 94.6% and 93.1%, balancing precision and recall. The edge-sharing network demonstrated strong performance in response time, throughput, packet loss, and latency. Additionally, the blockchain-based transaction verification system achieved 99.9% accuracy, excelling in verification speed, network latency, and throughput.

Keywords

blockchain big data analytics financial fraud detection data mining data privacy error rate edge sharing network and block chain-based transaction

Introduction

Modern technological advancements have profoundly reshaped industries, including finance and education. This modernization has extended to financial transactions, shifting from conventional currencies to digital forms and digitizing virtually all monetary exchanges. While offering convenience, these digital transactions are increasingly exposed to digital threats, including fraud, anomalies, and privacy breaches. The sheer volume of digital financial transactions directly correlates with a rise in fraudulent activities, leading to billions of dollars in global losses annually.¹ Anomalous network activity, a critical concern in cybersecurity and digital finance, involves identifying these irregularities to prevent network fraud and illicit operations. While anomaly detection software has proven effective in pinpointing hackers and fraudsters in centralized financial systems, the emergence of digital currencies and blockchain technology introduces new challenges and a growing need for specialized detection tools.²

The rise of Artificial Intelligence (AI) and Machine Learning (ML) has led to the development of numerous methods for identifying digital transaction abnormalities and fraud.³ For centralized systems, various supervised ML algorithms have been explored, with studies comparing their efficacy.^4,5 For instance, Random Forest and XGBoost classifiers have been successfully applied to examine over 300,000 accounts for fraudulent business identification.⁶ XGBoost has also been utilized to predict driver performance and telematic data from insurance datasets, even addressing issues of skewed data.^7,8

However, traditional approaches to fraud detection, particularly in credit card transactions, face inherent data mining challenges.⁹ A significant hurdle for researchers is the lack of access to real-time, sensitive client data due to stringent bank privacy policies.¹⁰ Efforts to overcome this include distributed data mining methodologies for credit card transactions¹¹ and proactive algorithms to address the “cold-start” problem without relying on historical fraud data.¹² Techniques like uncertain association law mining have been proposed to extract meaningful patterns from credit card transactions.¹³ Specific ML models, such as Support Vector Machines, have been trained to identify transaction errors.¹⁴ Furthermore, hybrid approaches combining Bayesian, rule-based, and Dempster-Shafer theories have been used to reduce fraud identification noise.¹⁵ Transaction aggregation to assess consumer behavior and detect fraudulent patterns has also been explored, with the primary aim of identifying anomalies in unknown datasets, respecting data privacy by not ranking participant traits.¹⁶

With the increasing adoption of blockchain technology in both public and private sectors for auditing systems and protecting auditor privacy, new methods are emerging. While consensus algorithms are crucial for transaction verification,¹⁷ blockchain’s inherent design introduces new complexities. In the context of IoT-driven smart cities, centralized systems face challenges in trust, privacy, security, and verifiability.¹⁸ Solutions involving encryption, such as asymmetric, symmetric, and homomorphic encryption, have been proposed to address privacy, though often at the cost of high processing power and time.¹⁹ Blockchain, integrated with deep learning algorithms, has been used to detect cyberattacks and ensure security and privacy in cloud-migrated virtual machines.²⁰ Approaches like Gaussian Mixture-of-Localization-based Outliers (MLO) systems have been developed for cloud-based collaborative anomaly detection to identify insider and outsider threats.²¹ Privacy-preserving anomaly detection frameworks using Gaussian Mixture Models (GMM) have also been presented for cyber-physical systems.²²

Despite these advancements, several critical limitations persist. Many robust AI/ML anomaly detection methods are primarily optimized for centralized systems and do not offer viable solutions for the decentralized nature of blockchain.² While blockchain offers decentralization and immutability, transaction identification and fraud detection within blockchain networks can be inefficient.¹⁷ The presence of “malignant” participants further complicates security.²³ A fundamental impediment for academics in fraud detection is the inability to access real-time, sensitive financial data due to bank privacy policies, hindering the development and validation of robust models.¹⁰ Cryptographic methods used for data privacy, while effective, often demand significant processing power and time, limiting their real-time applicability.¹⁹ Existing privacy-preserving anomaly detection techniques may be ineffective against sophisticated, contemporary IoT attacks.²² Traditional centralized systems often struggle with the real-time processing, low latency, and scalability required to handle the escalating volumes of digital transactions effectively, especially in rapidly evolving fraud scenarios.

Contribution

This work makes significant contributions to the field of fraud detection in financial systems, particularly in Mobile Edge Computing (MEC) environments. A new blockchain-based computing paradigm is developed to facilitate safe cooperation across edge nodes for precise and reliable fraud diagnosis. A differential privacy technique is used, using randomized replies to safeguard sensitive financial data and maintain client and transaction confidentiality. A hierarchical structure is advised for financial records, which is adaptively segmented for efficiency and facilitates rapid and secure processing of financial data. The proposed Blockchain-Secured Computational Model (BSCM) surpasses conventional client-server models, decreasing data write execution time by 2.6 times and enhancing data retrieval efficiency by 20 times. The financial record selection and information extraction models attain accuracy rates of 96.5% and 95.2%, respectively, with F1-scores of 94.6% and 93.1%, indicating a commendable equilibrium between precision and recall. The edge-sharing network exhibits robust performance in reaction time, throughput, packet loss, and latency, making it appropriate for real-time fraud detection applications. The blockchain-based transaction verification system attains an accuracy of 99.9%, demonstrating superior performance in verification speed, network latency, and throughput.

The rest of the paper is organized as follows: Section 2 presents the overall workflow of the proposed system, including Financial Record Extraction, Financial Record Selection, Edge Sharing Network, Blockchain-based Secure Computation Model, and Fraud Detection Recommendation; Section 3 presents Results and Discussion. Finally, Section 4 draws the Conclusion.

The process model of the proposed system workflow

Figure 1 shows the suggested system process. To resolve missing values, an information extraction model processes a financial record and its timestamp to build an information space. The system clusters optimum records, uses differential privacy for security, and stores them in edge nodes. A blockchain-derived threshold governs this network’s information sharing. Blockchain data is analyzed by a financial specialist to identify client identification (fraudulent or authentic). The system sends transaction results to update records and store feedback, providing a continuous learning loop.

Figure 1.

Proposed system workflow.

The proposed system follows these steps:

(1) Financial record $r e c o r d_{i}$ contains information at $t_{n}$ .

(2) The information extraction model uses a missing value function to extract client information into information space $D$ and choose the most relevant information.

(3) After finding the optimal information subspace, the financial record selection model partitions the data to get the best $r e c o r d_{i}$ for the cluster.

(4) Use differential privacy to preserve customer privacy since the ideal $r e c o r d_{i}$ is subject to internal passive adversaries. Next, nearby edge nodes store $r e c o r d_{i}$ so financial experts may access it.

(5) Edge nodes in the edge sharing network may cooperate to share information according to the private blockchain threshold. This level determines the one-step neighboring node’s trustworthiness.

(6) The relevant customer information is submitted to the appropriate specialist, who determines the client’s identity (fraud or real). The blockchain records this transaction in the header of a block to safeguard the integrity of all financial transactions among its users. These steps avoid transaction disclosure.

(7) Get financial transaction results (fraud or real).

(8) The current treatment input is supplied to the financial record selection model to update records. In addition, the edge sharing network will save the associated input for future fraud detection.

Financial data searches may be injected with noise proportionate to their sensitivity using the Laplace technique, ensuring good privacy. For high-dimensional data, the Gaussian technique might be utilized. The method and noise calibration depend on the application and privacy level. The blockchain threshold may be based on a consensus method like Proof of Stake (PoS) or Byzantine Fault Tolerance (BFT) to ensure network integrity and trustworthiness. Smart contracts may also govern node interactions like transaction validation and data exchange. A node trust score may assess network node repute. Public Key Infrastructure (PKI) authentication and management of edge nodes with unique public–private key pairs are expected.

The noise calibration process would involve calculating the sensitivity of queries and adding noise accordingly. The privacy budget, represented by the epsilon (ε) value, would be carefully managed to balance privacy and accuracy. A smaller ε value would provide stronger privacy guarantees but might compromise accuracy. The exact calibration and budget allocation would depend on the specific requirements of the financial fraud detection application.

The differential privacy implementation involves the following steps:

Step 1: Calculate the sensitivity of the query, which represents the maximum change in the query result when a single record is added or removed.

Step 2: Calibrate the noise scale based on the sensitivity and the desired privacy budget (ε value).

Step 3: Add noise to the query result using the Laplace or Gaussian mechanism.

Step 4: Evaluate the trade-off between privacy and accuracy, adjusting the ε value as needed to achieve the desired balance.

Model for the extraction of financial data

Each individual customer $p$ being screened for fraudulent activity in with the relevant information. The financial data related to the customer is specified as a vector with the notation $d \in D$ , where $D$ represents the financial data. However, there are a significant number of missing data in the financial system, which interfere with the typical calculation between various vectors of the information. As a result, it is estimated with the highest possible probability as a function to fill in missing values in order to do value interpolation. Figure 2 presents the model for extraction of financial data.

Figure 2.

Block diagram for the extraction of financial data.

The assumption is made that every $d$ in $D$ has been normalized to the range $[0, 1]$ . There are $d_{d}$ dimensions in space $D$ . Every dimension is equivalent to a characteristic that is associated with one specific setting as per financial records are important factors under consideration. For example, if $d_{d}$ equals 3, then information vector with the value of $d$ can be associated to customer’s identity proofs. Each space $D$ with a dissimilarity function $S_{D}$ is provided in order to identify the best possible information subspace for the purpose of making more appropriate financial data recommendations. The function $S_{D}$ is indexed by a function which is written as $M (d_{t})$ . This function measures the distance between distinct contexts so that it is determined that how similar they are to one another. In this function, $c_{t}$ stands for the center, while $b_{t}$ refers to the radius. $D (d_{1}, d_{2}) = \max_{d_{1}, d_{2} \in D} S_{D} (d_{1}, d_{2})$ is the formula that represents the greatest dissimilarity distance between two information vectors. The information space is constructed to determine the speed at which the information is explored. Finally, the best information space is located to obtain acceptable financial data records.

Model for blockchain-enabled differential privacy: Financial record selection

The financial record selection is constructed using the history of financial data of several customers and condition feedbacks from those customers, and is denoted by the notation $M_{R}$ , where $R$ stands for the matching space that contains fraud detection records. This model is formed of numerous financial record nodes, each of which stores a separate fraud detection record. The $m^{t h}$ node at depth $h \geq 0$ is referred as $(h, m)$ , and the maximum value that m may have is determined by the value of $1 \leq m \leq 2^{h}$ . As a consequence of this, any parent node that has the value $(h, m)$ may be represented by either one of two child nodes: $(h + 1, 2 m - 1)$ and $(h + 1, 2 m + 1)$ .

In addition, because of the restricted capacity of the nodes, it should first constrain the maximum and minimum size and then split them in an adaptive manner according to the increasing amount of incoming financial information. A limit on the size of the nodes has been set and is subdivided into more valuable sub-nodes, which will allow us to provide more precise fraud findings as the number of system working cycles increases. However, after the ideal node that has the most relevant financial data has been chosen, the internal interested attackers will utilize certain program analysis abilities in order to masquerade themselves as financial professionals when they are in the financial institution in order to get the personal fraud detection findings. In order to protect the confidentiality of our customers, the Blockchain-enabled differential privacy (BEDP) has been used here. Figure 3 presents the block diagram for financial record selection.

Figure 3.

Block diagram for financial record selection.

In order to evaluate the data’s worth without jeopardizing the BEDP, a reconstruction probability event is implemented. The likelihood of successfully recreating the person’s records using the modified data is given by the $P ({M R}_{i} = M R)$ . When the reconstruction probability is at its highest, this model meets the criterion $p_{r e c o r d} \leq \max (R (r e c o r d_{i}))$ in which the greatest possible utility is attained while maintaining the customers’ confidentiality. $R (.)$ is the random response.

Sharing network at the leading edge

The edge sharing network has $N_{e}$ edge nodes, which may be indexed using the set $N_{e} = {1, 2, \dots | N_{e} |}$ . In order to describe the linked edge nodes in the financial record, the notation $N_{l i n k} (N_{e}, E)$ is used, where $E$ denotes the collection of edges. If node $x$ is $y^{'} s$ neighbour then set $E (x, y)$ equal to 1, and if it isn’t, then set it equal to 0. Because edge nodes are equipped with powerful computer capabilities, they are able to maintain extensive fraud detection data for customer. Figure 4 presents block diagram for Sharing Network at the Leading Edge. In addition, as the edge nodes are the components of the sharing network, they are able to exchange financial information with one another in order to arrive at a joint fraud detection. The customer identity record $r e c o r d_{i}$ is immediately transmitted with the lowest possible latency, hence avoiding the edge sharing network. The physician will provide a definitive fraud detection result for the customer after taking into consideration the present conditions as well as the customer’s previous financial records in order to properly treat the customer. The most recent version of the blockchain is enabled to generate a sharing accuracy threshold, which is then established with the help of the proposed authentication technique in order to guarantee the integrity of financial transactions and the dissemination of correct information. This gives us the ability to determine whether or not the edge nodes that are already in place can be trusted.

Figure 4.

Block diagram for sharing network at the leading edge.

Blockchain-based secure computation model

The blockchain is constructed up of linked blocks, each of which has inside its body a decentralized hash sub-table, in which are stored certain references to the various financial transactions. In addition, the block head includes a variety of characteristics (such as a timestamp, version number, and so on) that manage the dataflow, guarantee that the block cannot be altered, and accomplish the bulk of the blockchain’s most essential operations. In this configuration, secure multiparty computation (SMPC) ensures that the data is partitioned such that it may be discretely calculated. As a result, neither party is able to deduce any significant financial information from the neighboring edge nodes since they are unreliable.²⁴ To ensure the safety of the customers’ financial transactions, first generalize the fraud detection recommendation issue using the blockchain-based secure computation model (BSCM). A protocol is devised that is based on SMPC to produce sharing randomness and assess the amount of correctness in order to ensure the accuracy of the information that is being shared and choose which edge node would be most suited for us to work with. Each shared record is represented by a unique identifier (logical ID) and an additive share in the sharing protocol, which supports additive homomorphism of various financial records. This allows for computations to be performed on the shared data without revealing the individual shares. The efficient cooperative fraud detection is performed by using this updated system, which enables us to compute the accuracy of data sharing across many edge nodes, each of which stores a separate set of information.

Fraud detection recommendation

Error rates (ERs) are determined by tallying up the frequency with which false negatives (FNs) and false positives (FPs) occur. A false negative example is one in which the malicious result is correctly detected as being negative, whereas a false positive example is one in which the malicious outcome is incorrectly identified as being positive. The authors of this paper examine the discrepancies between ideal fraud detection input and practical feedback and why they occur. Our objective is to reduce the number of errors as much as possible while also ensuring that the regret is sublinear. This will ensure that the regret may converge to a low level. As a result, the issue of making a fraud detection be stated as $m i n i m i s e E R = F P + F N$ , where $E R$ relies on the quantity of both false positive instances and false negative examples.

Results and discussion

The experiment was conducted on a robust system with an Intel Xeon E5-2698 v4 CPU, 128 GB of RAM, and 1 TB SSD storage, running Ubuntu 20.04 LTS. The software environment consisted of Python 3.8 as the programming language, Hyperledger Fabric 2.2 as the blockchain framework, and BlockSim for blockchain simulation. Custom Python scripts were developed to simulate the client-server model, and Docker 20.10 was used for containerization, with Docker Compose managing multi-container applications. The simulation setup included 6 nodes: one client node and two server nodes for the client-server model, and three peer nodes for the blockchain network. A 25.86 GB dataset comprising financial records was used for the experiment. The objective was to compare the performance of the Blockchain-based System for managing financial data (BSCM) with the traditional client-server model in handling large-scale financial data. The experiment measured latency, throughput, resource utilization, and scalability, with each test scenario executed 100 times to ensure statistical significance. The BSCM utilized a block structure with an 80-byte block header and a block body containing financial transaction data, leveraging cryptographic techniques for data integrity and non-repudiation.

The performance of proposed BSCM is evaluated in comparison to the client-server method of handling financial data using open source data.^25–27 This work simulates both the client-server and blockchain networks by using three nodes each. A client, which is a stand-in for a customer, will send a query request to two servers, each of which will represent a hospital that is in possession of the customer’s financial information. The request will be submitted by the client, who will act as the customer. The customer will behave as though the customer is making the inquiry in order to get the information. A file of 25.86 gigabytes in size is used in order to demonstrate a financial data record that is capable of being shared across both networks. This was done for the goal of proving that both networks are compatible with the data record. The financial information that was used to compile the dataset was gathered from references 28–30. A block on a blockchain network is made up of two parts, the block header and the block content. These two parts make up the block. These two pieces come together to form a block. The information on the block’s metadata is stored in the header of the block.³¹ This information includes the date, the version, and the hash of the block that was there before it. The data pertaining to the transactions are saved inside the “body” of the block. Throughout the whole of our simulation, the typical block header size, which is 80 bytes, has been used. This setting was maintained throughout. This was done so that it could more realistically depict the situations that exist in the actual world. Time required to handle a growing number of financial records is contrasted between the BSCM and the client-server paradigm. By conducting this comparison, the method that is most effective in terms of processing the ever-increasing volume of financial data is identified in the shortest amount of time. Both approaches were put to the test using an increasing quantity of financial information. Both of these evaluations are carried out with the assistance of an ever-growing number of customer records.

In the first of many potential scenarios, a write transaction is simulated by uploading 50 financial files, each of which will have a size of 25.86 terabytes. This will be done in order to test how well the system handles large amounts of data. To do this, client-server and blockchain networks function in tandem. The client makes requests to the server, which the server then fulfills based on the client’s needs. For example, the client may ask the server to incorporate 50 financial files into the database. When a successful data upload has been completed on a particular server, the server will communicate with the client by means of acknowledgments. To get started, determine the overall length of time that will be necessary to complete this task. The client transfers 50 distinct financial files to the blockchain network so that the blockchain network may process the data contained inside those files. Every file is copied onto both of the servers that are connected to the network so that they are both up to date. In order for all of the servers in the network to come to a conclusion about the transaction, each server in the network has to first generate a block that is pertinent to the transaction and then communicate that block to the other servers in the network. Only then will all of the servers in the network be able to come to a decision regarding the transaction. The client will get acknowledgments alerting them that the data modifications were successfully completed as soon as the two servers have come to an agreement on anything. The block will be added to the chain as soon as the two servers have come to an agreement regarding anything. To get started, the overall length of time is determined that will be necessary to complete this task. Once again, the client-server protocol and the blockchain network are used in the process of writing transactions. The quantity of written financial files will be raised from 50 to 100 at intervals of 10 this time around.

In the second scenario, client-server and blockchain networks are used to simulate the read activity of querying 50 financial files in order to simulate the read activity. It is the client’s responsibility to initiate the communication process with the server in a model of computing that is referred to as a client-server architecture. This may be done by submitting a request to the server that includes a data query. Once the server has provided a response and sent it to the client, the client is the one who will get the pertinent financial files. The server will send them to the client. In order to get started on this job, you need to first determine the entire length of time that will be necessary to accomplish it. The user is actively participating in the functioning of the blockchain network by sending data query requests for a total of 50 unique financial files. These queries cover a wide range of conditions. These inquiries are in reference to the individual customer’s personal financial records. Both of the servers get each request while it is being processed simultaneously. Every server in the network has the duty of producing a block that is specific to the transaction that includes data access and then transmitting that block to the other servers in the network. This is to ensure that the integrity of the network is maintained. This procedure has to be carried out again for each transaction that requires accessing the data. This obligation is recognized in the industry as “producer responsibility.” This is done to ensure that there will be no issues with the functioning of the network. The block will be uploaded to the network as soon as everyone has come to an agreement about a solution to any difficulty, and the client will be provided with the relevant financial information at that time. When a client makes a request for data, the data will come from one of two places: the client’s local database, or, if the client already has a copy of the ledger, it will come from the server that is situated in the area that is geographically closest to the client. If the client does not already have a copy of the ledger, the data will come from the server that is located in the area that is geographically closest to the client. The client’s local database will be queried for the information in the event that the client does not own a copy of the ledger. When a client makes a request for data, the data will arrive from one of two locations: the client’s local database, or the server that is located in the region that is positioned in the area that is geographically nearest to the client. When a client makes a request for data, that client’s local database will provide the information that the client needs. The data read procedure is carried out for both the client-server and blockchain networking systems. However, at this point in time, the number of financial files that are being requested has gone from 50 to 100, and there is a break of 10 seconds between each increment of 10.

The amount of time required to read a financial file from a database that contains an increasing number of financial files is shown in Figures 5 and 6 for write and read operations, respectively. The Figures 5 and 6 provide information on these timeframes. This time is broken down into two categories: client-server and BSCM. It demonstrates that the time required to execute BSCM is noticeably less than that required by the client-server strategy. This is due to the fact that the data is downloaded from the centralized server in the client-server strategy, but in the BSCM approach, the data is retrieved from a local copy of the ledger. The only thing that contributes to the length of time it takes for BSCM to carry out an operation is the sending of data query requests to each and every server in the network, followed by the addition of those requests as transactions in blocks after consensus has been reached.

Figure 5.

Data write operation when employing a client-server and BSCM with increasing number of financial records.

Figure 6.

Data read operation when employing a client-server and BSCM with increasing number of financial records.

The number of financial files that need to be processed will directly correlate to an increase in the total amount of time that will be necessary to complete these procedures. According to the results, the amount of time that is required to carry out the client-server and BlockHR techniques grows in a linear manner as the number of financial files that need to be processed increases. This is the case regardless of whether strategy is used. This is the case regardless of the strategy that is used in the scenario that is being discussed. On the other hand, in comparison, the amount of time required to carry out the client-server method is much smaller than the amount of time required to carry out the BSCM methodology. Because the BSCM employs a technique that is based on consensus for the validation and replication of data, one of the probable conclusions that may be drawn from this observation is that it is a consequence of the approach that was used. The genesis of this realization may be traced back to the aforementioned observation. The procedure in issue is entirely responsible for the results that were obtained. The financial data that has to be kept up to date on the ledger will be transmitted here so that it may be checked, and here is the spot where the check will take place. It is necessary to obtain this information from each and every server that is a part of the network and keeps its own independent copy of the ledger. In addition, in order to add a new block to the ledger, every server in the network must first come to an agreement with the other servers in the network over that server’s block. This must be done before the new block can be added to the ledger. Before the new block can be added to the ledger, this step has to be completed first. This may be accomplished by instructing the relevant server to send a copy of the block it is responsible for maintaining to each of the other servers that are a member of the network. When everyone has reached a decision upon anything, the new block will be able to be added to the ledger as soon as it is possible to. In contrast, while using the client-server technique, the request to update the data is only sent to the centralized server. This ensures that the data is always accurate. This guarantees that the data are correct all of the time. This is because the client-server method is, in and of itself, a strategy that utilizes client-server architecture. As a direct consequence of this fact, the amount of time required for the execution of a data write operation when utilizing the client-server technique is a significant lot less than the amount of time required by BSCM. When compared to BSCM, the client-server technique results in a time reduction that is 2.6 times less than what is required to generate a financial file. This time reduction is due to the fact that the two components communicate directly with one another.

Financial records play a crucial role in reducing errors in financial applications, often more so than increasing the number of customers. This is because financial records provide richer datasets, containing valuable information about customers’ financial behavior, transactions, and history. As the number of financial records increases, models can learn from more diverse and detailed data, leading to better performance and reduced errors. In contrast, increasing the number of customers may not lead to proportional improvements in error reduction due to diminishing returns, as each additional customer may not contribute equally valuable information. Financial records also provide increased precision, enabling models to make more accurate predictions and reduce errors. Furthermore, analyzing financial records helps models handle outliers and anomalies more effectively, leading to more robust patterns and relationships. Ultimately, financial records enable more accurate risk assessments, reducing the likelihood of errors and improving overall performance. By leveraging financial records, financial institutions can build more accurate models, reduce errors, and improve decision-making.

An analysis of error rate

Error rate comparison can be seen in Figures 7 and 8. In the very beginning, the error rate has a bigger value and then the slope decreases as the number of customers and financial record increases. This work is analyzed for the two network cases, that is, without edge nodes and with edge nodes (BSCM). Figures 7 and 8 show the impact of the number of edge nodes on error rate. This indicates to effectively use the dataset to provide more accurate financial suggestions, which in turn results in a significant rise in the number of successful fraud detection. However, due to the fact that the information space is set and it is unable to acquire any new information in order to fulfill additional demands. An increase in financial record is more advantageous for decreasing performance losses than an increase in customers, which is appropriate in the context of the real-world situation.

Figure 7.

Error rate comparison with number of customers.

Figure 8.

Error rate comparison with number of financial records.

Table 1 lists the performance measures for the Financial Record Selection Model and the Information Extraction Model. Regarding the extraction and selection of relevant financial information, the findings reveal that both models have performed really well. The Information Extraction Model’s 95.2% accuracy indicates that, 95.2% of the time it faithfully collected pertinent information from financial data. With an even more accuracy of 96.5%, the Financial Record Selection Model effectively chose relevant financial records 96.5% of the time. With an Information Extraction Model accuracy of 92.5%, 92.5% of the information that was gathered was pertinent. With an accuracy of 94.2%, the Financial Record Selection Model indicates that 94.2% of the chosen financial records were relevant. With 93.8% of the relevant information extracted from financial data, the Information Extraction Model shows a recall rate. With a recall of 95.1%, the Financial Record Selection Model proved able to choose 95.1% of the relevant financial records. Precision and recall taken weighted average form generates the F1-score. Whereas the Financial Record Selection Model has an F1-score of 94.6%, the Information Extraction Model has an F1-score of 93.1%. This suggests that both models have struck a decent mix of recall and accuracy. The MSE gauges the average squared difference between expected and actual data. With an MSE of 0.021, the Information Extraction Model is better than the Financial Record Selection Model with 0.018. This suggests that in their predictions both models have low error rates.

Table 1.

Performance metrics.

Metric	For information extraction model	For financial record selection model
Accuracy	95.2%	96.5%
Precision	92.5%	94.2%
Recall	93.8%	95.1%
F1-score	93.1%	94.6%
Mean squared error (MSE)	0.021	0.018

Table 2 shows edge sharing network performance indicators. The edge sharing network has excellent response time, throughput, packet loss rate, and network latency. The edge sharing network responds to queries in 0.35 seconds on average. The edge sharing network can manage modest data load with its average throughput of 1.2 Mbps. The edge sharing network delivers packets safely with 0.05 packet loss. The edge sharing network has 0.2 seconds of network latency, suggesting fast data transmission.

Table 2.

Performance metrics for edge sharing network.

Metric	For edge sharing network
Average response time	0.35 seconds
Average throughput	1.2 Mbps
Packet loss rate	0.05
Network latency	0.235 seconds

Table 3 shows blockchain-based transaction verification performance indicators. The system performed well in transaction verification time, correctness, network latency, and throughput. Blockchain-based transaction verification can verify transactions in 0.5 seconds. The blockchain-based transaction verification system verifies transactions with 99.9% accuracy. Blockchain-based transaction verification system has 0.3 seconds blockchain network latency, suggesting fast data transmission. The blockchain-based transaction verification system can manage moderate data load with 0.8 Mbps blockchain network throughput.

Table 3.

Performance metrics for blockchain-based transaction verification.

Metric	For edge sharing network
Transaction verification time	0.535 seconds
Transaction verification accuracy	99.9%
Blockchain network latency	0.335 seconds
Blockchain network throughput	0.8 Mbps

Conclusion

In order to send and receive financial records with information sharing, this study requires specific mobile devices that are modeled as edge nodes. Additionally, this research requires certain devices that can operate as containers in order to store essential information. This is necessary in order to safeguard the confidentiality of the information. Our approach does have some potential drawbacks, however, including the fact that it is dependent on the fruitful cooperation of a large number of technologies in order to produce desirable results; failing to do so would result in unfavorable outcomes; and the quantity of information that can be communicated between edge nodes is limited. In this research, a blockchain-enabled secure computing platform is introduced for fraudulent detection that operates under a differential privacy mechanism. Computing on the edge of mobile networks and the processing of large amounts of data are both facilitated by the approach.

Our performance testing revealed that as compared to the BSCM technique, the client-server method is 2.6 times faster when it comes to writing the contents of financial files. This is the case even though both methods use the same number of steps. The read operation that is carried out by BSCM, on the other hand, is 20 times faster than the client-server technique. The proposed system’s performance is highlighted by several key metrics. The proposed model achieved an accuracy of 95.2%, while the Financial Record Selection Model performed even better with an accuracy of 96.5%. The Edge Sharing Network demonstrated rapid responsiveness with an average response time of 0.35 seconds. Transaction verification was also efficient, taking 0.535 seconds with an impressive accuracy of 99.9%. Furthermore, the blockchain network latency was recorded at 0.335 seconds, indicating swift data transmission. These metrics collectively underscore the system’s effectiveness in handling financial data and transactions securely and efficiently.

Within the scope of the work that is planned for the foreseeable future, to actually develop the framework in order to carry out exhaustive testing on its privacy, security, and performance.

Footnotes

ORCID iD

Jing Shi

Funding

The author received no financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interests

The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

Data may be obtained from the authors upon reasonable request.*

References

Staudemeyer

Voyiatzis

Moldovan

, et al. Smart cities under attack. Abbas Moallem, Human-computer interaction and cybersecurity handbook. Boca Raton, FL: CRC Press, 2018.

Podgorelec

Turkanović

Karakatič

. A machine learning-based method for automated blockchain transaction signing including personalized anomaly detection. Sensors 2020; 20: 147.

Nakamoto

(2008). Bitcoin: A peer-to-peer electronic cash system . https://bitcoin.org/bitcoin.pdf

Aziz

ASA

Hassanien

Azar

, et al. Genetic algorithm with different feature selection techniques for anomaly detectors generation. In: Proceedings of the 2013 Federated Conference on Computer Science and Information Systems (FedCSIS), Kraków, Poland, 08–11September 2013, pp. 8–11.

Hassanien

Tolba

Azar

. Advanced machine learning technologies and applications: second international conference, AMLTACommunications in computer and information 2014, Cairo, Egypt, 28–30 November 2014. In: Hassanien

Tolba

Azar

. science.488 Berlin/Heidelberg, Germany: Springer, 2014, vol.

Khan

Asghar

, et al. Fake review classification using supervised machine learning. In: Proceedings of the international conference on pattern recognition, virtual event, Cham, Switzerland, 10–15 January 2021, pp. 269–288.

Shahbazi

Hazra

Park

, et al. Toward improving the prediction accuracy of product recommendation system using extreme gradient boosting and encoding approaches. Symmetry 2020; 12: 1566.

Pesantez-Narvaez

Guillen

Alcañiz

. Predicting motor insurance claims using telematics data—XGBoost versus logistic regression. Risks 2019; 7: 70.

Wei

, et al. A survey on blockchain anomaly detection using data mining techniques. In: Proceedings of the international conference on blockchain and trustworthy systems, Guangzhou, China, 7–8 December 2019.

10.

Reid

Harrigan

. An analysis of anonymity in the bitcoin system. In: Security and privacy in social networks. New York, NY: Springer, 2013, pp. 197–223.

11.

Ngai

EWT

Wong

, et al. The application of data mining techniques in financial fraud detection: a classification framework and an academic review of literature. Decis Support Syst 2011; 50: 559–569.

12.

Saia

Carta

. Evaluating credit card transactions in the frequency domain for a proactive fraud detection approach. In: Proceedings of the 14th International Conference on Security and Cryptography (SECRYPT 2017), Madrid, Spain, 26–28 July 2017, pp. 335–342.

13.

Sánchez

Vila

Cerda

, et al. Association rules applied to credit card fraud detection. Expert Syst Appl 2009; 36: 3630–3640.

14.

Gyamfi

Abdulai

. Bank fraud detection using support vector machine. Proceedings of the 2018 IEEE 9th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), Vancouver, BC, Canada, 1–3 November 2018, pp. 37–41.

15.

Panigrahi

Kundu

Sural

, et al. Credit card fraud detection: a fusion approach using Dempster–Shafer theory and Bayesian learning. Inf Fusion 2009; 10: 354–363.

16.

Shi

Sun

Gao

, et al. Anomaly detection in Bitcoin market via price return analysis. PLoS One 2019; 14: e0218341.

17.

Ostapowicz

Żbikowski

. Detecting fraudulent accounts on blockchain: a supervised approach. In: Proceedings of the international conference on web information systems engineering, Hong Kong, China, 19–22 January 2020, pp. 18–31.

18.

Kumar

Gupta

Tripathi

. TP2SF: a Trustworthy Privacy-Preserving Secured Framework for sustainable smart cities by leveraging blockchain and machine learning. J Syst Architect 2021; 115: 101954.

19.

Zhao

Tarus

Yang

, et al. Privacy-preserving clustering for big data in cyber-physical-social systems: survey and perspectives. Inf Sci 2020; 515: 132–155.

20.

Alkadi

Moustafa

Turnbull

, et al. A deep blockchain framework-enabled collaborative intrusion detection for protecting IoT and cloud networks. IEEE Internet Things J 2020; 8: 9463–9472.

21.

AlKadi

Moustafa

Turnbull

, et al. Mixture localization-based outliers models for securing data migration in cloud centers. IEEE Access 2019; 7: 114607–114618.

22.

Keshk

Sitnikova

Moustafa

, et al. An integrated framework for privacy-preserving based anomaly detection for cyber-physical systems. IEEE Trans Sustain Comput 2019; 6: 66–79.

23.

Farrugia

Ellul

Azzopardi

. Detection of illicit accounts over the Ethereum blockchain. Expert Syst Appl 2020; 150: 113318.

24.

Kuo

T-T

Ohno-Machado

. ModelChain: decentralized privacy-preserving healthcare predictive modelling framework on private blockchain networks. 2018. https://arxiv.org/abs/1802.01746

25.

Online weblink: https://www.kaggle.com/datasets/omershafiq/bitcoin-network-transactional-metadata

26.

Wang

Research on the influencing factors of block chain technology adoption in supply chain finance of small and medium-sized enterprises. Adv Manag Sci. 2023; 12(1): 1–3.

27.

Chu

Exploration of new energy vehicle recycling model based on blockchain. Adv Ind Eng Manag. 2023; 12(2): 18–23.

28.

Yuan

Wang

F-Y

. Blockchain: the state of the art and future trends. Acta Autom Sin 2016; 42(4): 481–494.

29.

Qin

Yuan

Wang

, et al. Economic issues in Bitcoin mining and blockchain research. In: 2018 IEEE Intelligent Vehicles Symposium (IV), Changshu, China, 26–30 June 2018, pp. 268–273.

30.

Yuan

Wang

F-Y

. Towards blockchain-based intelligent transportation systems. In: 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC), Rio de Janeiro, Brazil, 01–04 November 2016, pp. 2663–2668.

31.

Online weblink: https://www.hyperledger.org/