Abstract
Frequent pattern mining (FIM) identifies the most important patterns in data sets. However, due to the huge and high-dimensional nature of transactional data, classical pattern mining techniques suffer from the limitations of dimensions and data annotations. Recently, data mining while preserving privacy is considered as an important research area. Information privacy is a tradeoff that must be considered when using data. Through many years, privacy-preserving data mining (PPDM) made use of methods that are mostly based on heuristics. The operation of deletion was used to hide the sensitive information in PPDM. In this study, we used deep active learning to protect private and sensitive information. This paper combines entropy-based active learning with an attention-based approach to effectively hide sensitive patterns. The constructed models are then validated using high-dimensional transactional data with attention-based and active learning methods in a reinforcement environment. The results show that the proposed model can support and improve the effectiveness of decision-making by increasing the number of training instances through the use of a pooling technique and an entropy uncertainty measure. The proposed paradigm can achieve data sanitization by the hiding sensitive items and avoiding to hide the non-sensitive items. The model outperforms greedy, genetic, and particle swarm optimization approaches.
Introduction
Mobile networks in the fifth-generation (5G) are being installed in advanced countries and are expected to eventually supplant 4G. The higher bit rates not only enable faster data transfer rates but also increase the research field of software-defined networking (SDN) approaches. Higher capacities and ultra-low latency enabled applications that are primarily designed to be used in constrained conditions such as those found in the Internet of Things (IoT) are often used to directly connect to data centers. This has led to a fully mobile and connected society, which is always the goal in IoT applications. Secure transmission of mobile data from devices around the world also requires a high level of security [6]. Mobile nodes that may contain information that is personal in nature can be tracked down and exposed to attacks such as denial-of-service, eavesdropping, man-in-the-middle, and replay and repudiation. Higher speed requires a higher level of QoS execution time, even though a larger volume of data is transmitted. Deploying data-intensive devices in heterogeneous 5G networks requires a more thorough examination of privacy and security. Sensitive data is widely deployed in a variety of applications. For example, information about a customer’s purchases and location are somehow considered as the private and confidential information. In this case, the edge computing network needs to perform some critical activities without communicating with the central node, i.e., the cloud server. However, you have chosen to hide important information at a local level.
The main difficulty in developing machine learning applications is structuring high-quality, useful data. This is an expensive and time-consuming task. Active learning allows the user to learn from a small number of cases and determine the distribution of data on the label, which increases the applicability of machine learning. Numerous real-world applications are therefore, required to mine the data for useful and significant patterns [14]. These strategies primarily deal with mining patterns from real-world applications by using a variety of constraints such as co-occurrence, frequency, and interestingness measures.
Because data mining techniques are known to analyze databases to uncover implicit information, they can also cause revealing sensitive and/or private information, such as credit card numbers, passport numbers, social security numbers (SSN), credit-related scores, personal identification numbers (PIN), phone numbers (cell or home), address information (home/office), and purchase behaviors. Applications require safeguards against the leakage of such sensitive data, which can lead to significant security issues. As a result, companies show reluctance to provide any sensitive information due to strategic knowledge that if such data may be mishandled, it can lead to privacy issues potentially.
In light of these pervasive concerns, PPDM, also known as privacy-preserving data mining, has become a critical issue in recent years. PPDM has the main goal to de-identify sensitive information while providing the necessary knowledge discovered from databases. The flow of the method is shown in Fig. 1. The most common approach to cleaning sensitive data is to make additions and deletions directly in the database. There are numerous heuristic solutions based on the above strategies. Since the fundamental problem is NP-hard, only practical heuristics have been proposed. However, since these approaches are “heuristic” in nature, they often work well in practice, but may lead to unpredictable results. In general, the most common negative effects include the introduction of new fake patterns and the obfuscation of insensitive patterns. Therefore, finding an appropriate collection of sanitization techniques for disrupting the original database to protect sensitive information while minimizing side effects is a non-trivial task.

Data sanitization progress.
Aggarwal discussed PPDM in 2006 [1]. Also, we have seen work by Lindell et al. who tried to solve the main problem of PPDM by using the ID3 algorithm [12]. Clifton also addressed a number of PPDM problems in [3]. Pandya presented a multiplicative perturbation (MPA) algorithm that was shown to balance data utility and privacy. Dwork et al. developed five techniques that can be used to manage databases with many features and vertical partitioning [5]. Lin et al. [7, 11] have presented several Genetic Algorithm (GA) and particle swarm optimization techniques for hiding sensitive information. Numerous methods have also been investigated using Frequent Itemset Mining (FIM) to conceal critical information [8].
In general, heuristic techniques such as greedy search and meta-heuristic approaches such as GA and PSO have been used to solve PPDM problems [7, 11]. Most of these conventional techniques are static and require a high processing overhead. The scientific community has seen in the literature the wonders of deep learning and deep reinforcement learning in a variety of domains [13, 15]. Despite their tremendous success, these new technologies have never been used to solve the PPDM problem. This paper addresses the PPDM problem by using a unique technique based on the deep Q-learning architecture [13]. A dynamic technique hides sensitive information and strikes a balance between privacy and knowledge discovery by using the deletion process. Deep Q-learning is a technique that combines deep learning, more specifically deep conventional neural networks, with reinforcement learning, or Q-based learning. This combination of two methods allows Q-learning to be applicable to both big data and real-world application scenarios [15]. We use a perturbation technique based on deletion operations to dynamically hide sensitive information, rather than statically pre-specifying the set of transactions to perturb. Thus, the proposed algorithm in this paper has the following major contributions: Meta-heuristic as well as heuristic techniques for database cleaning have been identified in the literature. Most of them use evolutionary approaches. This is the first paper that uses a deep Q-learning technique to solve the PPDM problem. Inspired directly by Q-based learning, the DRL algorithm accepts input states and returns pairs of states and actions. An advantage of this strategy is that it requires fewer parameters than other methods and optimally cleans sensitive data. A neural network that is internally optimized significantly speeds up execution at runtime. Deep Q-learning, on the other hand, requires the adjustment of parameters (number of hidden layers, number of units, and activation functions). Instead of determining in advance the transactions to be perturbed for sanitization, the technique presented here determines them dynamically.
The remainder of the paper is organized as follows. Section 2 discusses related work. Section 3 pro discusses preliminary steps and problem definition. Section 4 describes the DRL sanitization algorithm. Section 5 contains the experimental results. Section 7 draws a conclusion and discusses future work.
With the rapid expansion of data available through IoT networks and the advancement of data mining methods used to identify implicit information, we should see a significant increase in the amount of knowledge extracted from data that managers or decision-makers may use. However, with the increase in the amount of data collected, a privacy security problem also arises. In addition, due to the fact that data mining methods extract implicit information from data that may contain sensitive information, there are possible security issues associated with data sharing across various corporate parties. Thus, PPDM has gradually gained recognition as a crucial problem when it comes to the sanitization of sensitive information, while also keeping in mind the need to limit side effects. Additionally, there is a desire to conceal sensitive information while maintaining the database’s integrity. Agrawal et al. proposed a quantitative metric for evaluating the utility of PPDM approaches [1]. Verykios et al. [17] established a hierarchical categorization method for PPDM approaches.
Lindell and Pinkas described a technique for data sanitization that exists between shared databases without revealing sensitive information [12]. Clifton et al. developed a toolkit supported with a collection of strategies for use with specific types of databases that require PPDM [3]. Dehkordi et al. developed three approaches based on multicriteria GAs to sanitize sensitive rules by using a removal methodology that partially eliminates items in the database and evaluates the modified database. However, as a result of the sanitization procedure, missing rules and artificial data are produced [4]. Lin et al. then introduced a number of GA-based algorithms, including sGA2DT, pGA2DT [8, 11], and cpGA2DT [9], as well as PSO-based methods for sanitizing sensitive itemsets by eliminating transactions in PPDM [10].
Methodology
In Fig. 2, an entropy-based active learning and an attention-based transaction data sanitization model are described. When a batch of transaction data is received, the agent decides whether to remove the instances or not. The results are generated as a union of the new and previous states. The equation is used to calculate fitness values. The purpose of this article is to determine whether a pattern can be classified as sanitized or not. The attention-based network used in this article categorizes transaction data into two categories, clear or not clear. As described in Fig. 2, model, the technique based on entropy is made use of for the determination of the distribution of date from the complete unlabeled dataset. Focusing on the model that is used, the attention mechanism that is made use of primarily to recognize all of the patterns from only a small examples collection. Our initial training set is made up of a decent data amount and uses an an entropy-based algorithm to select from the training set the number of items for inclusion. The entropy-based methodology presented in this paper determines the number of points in a pool. The pool point is defined as a unique set that on a cycle-by-cycle basis is updated. Adding the selected point directly to the training set, then, based on the new ponts, an alternative model is trained. With the training set the gradual progression allows for the repetition of phases alongside meaningful points. The technique as described in this paper can clearly reduce our data annotation effort and easily generalize any machine learning system to a larger applicable domains [14].

The proposed model.
In our technique, we use the number of items purchased in the pattern mining transaction data as contextual information. The regularity with which the likely items are purchased is always crucial. Therefore, the number of items in the transaction data is considered as contextual data points. Then we evaluate only a single part of the transaction data at a time. We use the attention mechanism to decide whether related things should be sanitized or not based on their contextual importance.
Agent
As seen in Fig. 2, the Markov Decision Process (MDP) is discussed as follows:
Model training
We start by transferring the properties over the dense 100 Relu units. The created model is then sent over the Luong attention approach, which takes advantage of the hidden state [16] of the decoder. The attention value is calculated and associated with the decoder’s hidden state. The output sequence is then sent through the 100 and 350 Relu units. The last layer was
Minimizing fitness function
We used the fitness value as an optimization result. The goal is to minimize the target time by using the Markov property. We used the optimizing policy to minimize the function Q π (s t , a t ). The active learning method optimizes the policy by learning through the interacting environment. The q-table is then applied into the reinforcement learning method and the deep active learning model.
Proposed algorithm
We used the proposed model for the data sanitization task as mentioned in Fig. 2 and Algorithm 1. We first calculate the frequent itemsets and state size, the number of instances in the transaction data and the number of cycles to run the active learning method for each pool. The algorithm takes the frequent itemsets under the user-given threshold value (Algorithm 1, line 1). Then, we select the ratio of sensitive itemset (Algorithm 1, line 2). The selected itemset is then projected from the transactions (Algorithm 1, line 3). Then, we initialized the Q-learning table by combining the sensitive itemsets and transaction selected instances concerning states (Algorithm 1, line 4). The model makes the random decision for exploration and then calculates fitness values. The attention network then uses these values for the learning of labeled data and prediction of the pool (Algorithm 1, lines 5-8). The agent is learnt to execute the action, reward calculation, and update states. Then, the bellman equation is updated [2] (Algorithm 1, lines 10-14). The reward, states, action and fitness values are resultant training set for active learning updates. The input of itemset frequency is used, and action (delete and not-delete) is used as a binary classification problem. The batches are used to select and expand active learning. After the exploration rate, the probability of the action is used in the query selection and entropy calculation. The output is the minimized fitness value that indicates the set of a hidden sensitive itemset in the dataset (Algorithm 1, line 17).
Experiments and results
We compare our proposed deep active reinforcement learning model (DARL) with the evolutionary algorithm, i.e., PSO2DT, cpGA2DT, sGA2DT, pGA2DT, as well as the greedy sanitization algorithm [10]. The experiments are conducted on Linux mint system 19.01 with 2070TI GPU and Core i7 10th generation processor. Four datasets from the SMPF data mining library are used, i.e., chess, mushrooms, food mart, and T10I4D100K [10]. We used the state size 100, active learning cycles 10 with different support values, and sensitive itemset selection based on the datasets. For agent training, we used episode size 100. We compared the method based on fitness values and hiding factors, i.e., successful hiding sensitive information.
Hiding factor analysis
As can be seen in Fig. 3, the deep active learning strategy performs better. Due to the deep dataset containing chess and mushrooms, the model is able to achieve a high percentage improvement. However, when we used the low support thresholds of the foodmart dataset, the model was not able to hide itemsets. The explanation for this is the small number of training examples in the pool and the small number of active learning cycles. Data exploration and more learning time are required to generalize a large data set. However, the proposed model performed better in the case of T10I4D100K. In summary, sparse datasets take longer to stabilize, while evolutionary and greedy algorithms are optimized for large datasets, such as a foodmart dataset. To obtain robust results, the proposed approach needs to be run over longer periods of time. This contributes to the increase in the number of training events. However, it is found that the performance of the linked feature set decreases as the pooling time increases. All strategies perform excellently when it comes to categorizing data sanitization. We also note that the feature set for the pooling approach should be expanded. Other features, such as support, weight, or item set occupancy, should be integrated to create a suitable learning approach that can be further developed in the future.

Comparison of methods in terms of the hiding factor produced for various support thresholds and sens_per: percentage of selected sensitive itemsets.
The observed data model can successfully achieve the goal when analyzing the fitness values compared to another model. The fitness values for chess and mushroom data are shown in Fig. 4 and it showed that the proposed model has better performabce. The larger the dataset, the more training is required for the proposed model to be generalized. For the agent-based deep active learning model, hours of adaptation of the model and environment are required. This helps in learning complex patterns related to cleanup. However, the proposed model may hide sensitive itemsets efficiently. In the future, we will explore the cycles, episodes, and states of hyper tuning with respect to the assigned support thresholds and features of the dataset.

Method comparison in terms of Fitness values obtained for various minimum support threshold values and sens_per: percentage of selected sensitive itemsets.
There are various extensions for future work that are directly related to this paper. First, the need for a more in-depth analysis of different PPDM environments and datasets is required. Moreover, as with all deep learning-based architectures, it is necessary to explore hybrid modifications that can assist in the strength of the learning algorithm. Lastly, translating the work done in this paper to federated learning environments, which are currently flourishing when applied to Internet of Things based learning infrastructures, may lead to fruitful results.
Conclusion
In 5th generation technology (5G), data privacy and anonymization can be performed at different levels. However, with increase of privacy issues, data-oriented applications require efficient ways of data sanitization. Researchers are exploring different methods to secure the 5G connected devices. The proposed model can analyze the sensitive information in the IoT environment and then hide the private information using the deep active reinforcement learning method. We used the agent-based method for transaction deletion operations. The active learning model can select the instance to be deleted based on the dynamic entropy-based selection. In an agent-based environment with reinforcement learning, active learning based on entropy is used in this study to classify data sanitization of transactional data. The model created is able to find patterns in sparse and dense datasets with high accuracy. The entropy-based active learning approach dramatically improves the number of training examples for the deep attention-based model. The method can minimize the side effects and balance privacy protection with knowledge discovery. We performed experiments under different thresholds values for different datasets. The proposed model can perform better than the state-of-the-art meta-heuristic (GA, PSO) and heuristic (greedy) techniques. In the future, we will explore hyper-tuning and other agent optimization strategies. We will optimize the tuning of the network to implement the active learning process. In addition, we could explore a weighted strategy for sub-sample selection for each class. Furthermore, multi-label categorization based on uncertainty, utility, frequency, and co-occurrence could be explored to further increase accuracy.
