Abstract
Wearable-devices have developed rapidly. Meanwhile, the security and privacy protection of user data has also occurred frequently. Aiming at the process of privacy protection of wearable-device data release, based on the conventional V-MDAV algorithm, this paper proposes a WSV-MDAV micro accumulation method based on weight W and susceptible attribute value sensitivity parameter S and introduces differential-privacy after micro accumulation operating. By simulating the Starlog dataset and the Adult dataset, the results show that, compared with the conventional multi-variable variable-length algorithm, the privacy protection method proposed in this paper has improved the privacy protection level of related devices, and the information distortion has been properly resolved. The construction of the release model can prevent susceptible data with identity tags from being tampered with, stolen, and leaked by criminals. It can avoid causing great spiritual and property losses to individuals, and avoid harming public safety caused by information leakage.
Introduction
The so-called wearable-device is a kind of intelligent computer device, which can be directly carried by the user conveniently and can realize interaction with the outside world while the user is free to move [1, 2]. Today’s wearable-devices fulfill the requests of people’s smart life through software-support, data- interaction, cloud-interaction, etc. [3]. The core technologies of wearable-devices include intelligent human-computer interaction technology, sensor technology, flexible electronic technology, and massive data processing technology [4–7]. Wearable-devices are products that enable people to quickly acquire data, maintain social connections efficiently, and gain a seamless network access experience. Wearable-devices, while entering people’s work and life, will also face unprecedented information security risks, such as personal privacy leaks, hacker long-range attacks, and theft of core data. Therefore, research on data privacy protection technology for wearable-devices is particularly important [8].
Many experts and scholars are studying the data privacy protection of wearable-devices. Since the launch of Google Glass in April 2012, wearable-devices have been developed in an unprecedented way, and have broad prospects for development. In addition to smart glasses, products such as smartwatches, smart bracelets, and intelligent blood pressure monitors are coming together [9–11]. At present, the global Top 5 manufacturers of smart wearable-devices are Fitbit, Xiaomi, Apple, Garmin, and Samsung [12]. In the future, wearable-devices will move toward intelligence, miniaturization, simplification, and low power consumption. Wearable-devices are rapidly developing, and privacy protection has also received attention from relevant departments or individuals at home and abroad. Developed countries such as European countries and the US have issued many laws and regulations related to personal privacy protection. The General Data Protection Regulations of the European Union include “forgotten rights”, personal data disclosure notices, and portability requirements [13]. Canada has imposed requirements for the protection of personal and electronic documents to stipulate the collection, use, or announce of personal information by the private sector in Canada during commercial activities [14]. Although developed-countries such as Europe and the United States have relatively good systems and laws for data and privacy protection, the relevant government and legal departments still indicate that the speed of revision of relevant laws is laborious to keep pace with the information-technology and commercial environment. There are regulations or not, and the privacy protection awareness and privacy-protection skeleton frame and appliance of users need to be improved. To solve the privacy problem of users, users should discretize the confidential information as much as possible [15]. Kaewkannate K et al. [16] uncovered the risks of the FitbitFlex wearable-device to its existence to obtain non-essential surrounding information, not providing all collected data, the voucher’s exposed when the MAC address is fixed and Bluetooth pairing. The researchers also set up the framework, models, and examples of data security and privacy protection for some proposed risks [17]. Safavid et al. [18] put forward nine conceptual principles including ten principles, including data privacy content scope, informed-consent, abuse protection, etc, and a secure port for exchanging relevant information for health care secure access control for user authentication, controllability of receiving data, and applications with stability and privacy protection [19]. Bettini C et al [20, 21] proposed a privacy protection model for handheld wireless-devices and wearable wireless devices. Some possible solutions like logical border mechanism and anonymous user-identification are proposed. Christopher Wolf [22] et al. proposed a flexible, heavy-duty wearable-device paradigm that introduces the paradigm from contextual respect, benefit-risk analysis, and elimination of identifiable data. In wearable-device applications, McMullan K [23] proposed that employers in the process of developing wearable-devices to develop corporate employee health strategies should ensure transparency in the collection of data content and usage environment, and guarantee that employees are completely voluntary and data being used reasonably. Domestic experts and scholars have also actively invested in the research of privacy protection and have achieved many results. Zhao Dapeng et al. [24] put forward a new method ARB to protect the user’s location privacy. This method takes into account the query probability, map data, information point semantic, and other side information. The results tell that the method has a better privacy protection effect and lower time complexity. He Xiaolin et al. [25] promoted the sustainable development of health care wearable-devices by adopting multiple data encryption technology, establishing hierarchical data management and control mode, and increasing equipment remote control functions [26]. Wang Le et al. [27] based on the traditional CP-ABE scheme, the data access behavior was filtered by using data attributes and user attributes. Experiments show that the program can effectively protect user privacy information. Li Tong et al. [28] put forward a method for data redistribution of wearable-devices based on K-anonymous clustering. The algorithm directly performs anonymization based on clustering on incremental data, which makes data anonymization and re-release more efficient. Zhao Yanhong et al. [29] considered the application of a blockchain technology that collectively maintains a reliable database through decentralization and high trust in the field of medical wearable-devices, which will help promote the transformation of medical systems and the safe sharing of medical data [30].
In many methods of traditional privacy protection technology, the micro accumulation algorithm is an important method for anonymizing microdata in the data distribution. The micro accumulation algorithm is a general term for a class of k anonymization algorithms. The traditional k anonymization implementation process focuses on the combination of generalization and suppression [31, 32]. There are two main defects in the privacy protection model based on micro accumulation. One is the privacy protection provided by the micro accumulation-based privacy protection model for attribute leakage, which is related to the background knowledge of malicious attackers [33]. In real life, attackers often have more background knowledge than what is assumed in the model. Therefore, a privacy protection model unrelated to background knowledge is needed to combat attribute leakage. Second, for these models, there is no rigorous way to compare the privacy protection capabilities of their models, usually measured by link probability. Especially after the parameter changes, it is difficult to analyze the changes in the privacy protection capabilities provided by the model, which reduces the reliability of the model.
To solve the privacy protection problem of wearable-device data release, this paper proposes a WSV-MDAV micro accumulation method based on the weight parameter W and the sensitivity parameter S of the susceptible attribute value, and introduce differential-privacy after micro accumulation operating, which constitutes a complete data-protection privacy protection method for wearable-equipment. Simulations on the Statlog dataset and the Adult dataset compare the two aspects of information loss and privacy risk. Compared with the conventional-V-MDAV method, a variable-length micro accumulation algorithm based on weight and sensitivity has smaller information loss and higher data efficiency. Particularly, while the k value is large, its performance is good and tends to be stable. When the value of k is the same, the risk of privacy leakage is lower in the method proposed in this paper. At the same time, it is found that the parameter ɛ of the differential privacy is at a lower numerical level, which means that the proposed method performs better in privacy protection. If the value of k increases, the privacy protection capability of the model will also increase.
Method
Variable-length micro accumulation algorithm based on weight and sensitivity (also known as WSV-MDAV algorithm)
The micro aggregation method is a significant method to realize the anonymity of microdata in the data distribution. MDV method is better than the known fixed length micro aggregation algorithm. V-MDAV method is a heuristic method for multivariable micro accumulation. However, it also has some problems. For example, the distance calculation method for different data types in the k-division process is not clear enough, and the susceptible attributes are susceptible to homogeny attacks, background knowledge attacks, and similitude attacks. For the above problems, this article proposes a variable-length micro accumulation algorithm based on weight-W and susceptible attribute value sensitivity-S. The input to the V-MDAV algorithm is the data set D to be protected, and the parameter k, k is a positive integer representing the minimum number of records for each packet to be k-anonymized; the output is a data set M obtained by a micro accumulation process.
(1) Personalized distance measurement method based on weight W
The distance needs to be calculated between different data types. In the calculation of the distance matrix, to achieve personalized privacy protection, an identifier attribute is weighted. As for any identifier attribute Qi, the formula for giving its weight ωQi is as follows:
Wherein ω Qiu means the weighting set by the users or the much data provider by the identifier attribute Qi. ω Qip is obtained by the data-publisher in line with the information entropy of the quasi-identifier attribute Qi. α is the weight set by the user, β is the weight given to the data publisher, and α and β are set to 1/2 by default.
The calculation ω
Qiu
is as follows:
In the above formula, n is the amount of the table’s data and is also the number of consumers or data-suppliers. ω Qiu Is it a custom weight for the user to align the identifier attributing Qi?
For the calculation ω
Qip
, this paper introduces information entropy. For identifier Qi, the entropy is HQi. The formula ω
Qip
is as follows:
In the above formula, ω Qip means the weight of the attribute Qi, HQi means the Shannon-entropy, and m means the number of quasi-identifier-attributes.
(2) Calculation of sensitivity parameter S of the susceptible attribute value
The purpose of privacy protection of data distribution is to (1) protect identity-information and stop identity information from leaking. (2) Protect susceptible attributes from leaking. According to the distribution of different susceptible attributes, in practice, it is more feasible to set different susceptible attribute constraints for different susceptible attribute values in each susceptible attribute.
For a data sheet with single-dimensional susceptible-attribute values, suppose that the data-set to be published contains the following attribute {Q1, Q2, ⋯ , Q
n
, SA}, where Q1, Q2, ⋯ , Q
n
is the quasi-identifier-attribute, SA is the susceptible-attribute, and SA1, SA2, ⋯ , SA
m
are the values of the susceptible-attribute SA and are ordinal type attribute. The definition of Si is as follows:
μ, v is usually set to 0.5.
1. Differential privacy
(1) Differential privacy concept
The basic idea of the differential-privacy model is to make random attacks on the published data, so that in the statistical sense, the attacker can’t obtain the privacy of individual users from the published data, regardless of any background knowledge and any data mining and analysis methods. The advantages of this model are mainly as follows: Firstly, there is no need to make special leave for the background knowledge of the attacker and the specific attack method. Secondly, the privacy budget was introduced to facilitate the quantitative analysis of the risk of privacy disclosure of published data [35]. Differential-privacy is defined based on adjacency datasheets.
Definition 1 (contiguous table) If two datasheets D1 and D2 exist and only one record differs, D1 and D2 are said to be adjacent datasheets.
Definition 2 (ɛ-Differential Privacy) If the random algorithm Q satisfies the following formula for any pair of adjacent data sheets D1 and D2 and any output S⊆Range(K):
Then the algorithm Q is said to satisfy the ɛ-differential privacy.
(2) Differential Privacy Implementation Based on the Laplace Mechanism
The Laplace mechanism was first proposed by Dwork et al. and is the most commonly used mechanism for implementing differential privacy. This machine-processed implements differential privacy protection by putting in Laplace noise to numerical data [34]. Two factors affect the differential privacy intensity under this mechanism. One is a successive random variable that contains the Laplace scatter blended in the inquiry outcome. The probability-density function is as follows:
Generally, the mixed noise is Laplace noise with μ= 0, so the factor that affects the privacy protection strength is the parameter b in the probability density function formula above;
The other is the Laplace mechanism sensitivity, also known as the global sensitivity S (F). The calculation formula is as follows:
The influence of sensitivity on the privacy protection intensity is mainly derived from the relationship theorem between the differential privacy parameter ɛ and the Laplace distribution parameter b and the sensitivity S(F) proposed by Dwork, as shown in Theorem 1.
Theorem 1 the function set F is provided, and its Laplace mechanism sensitivity is S (F), and K is an algorithm for adding independent stew to the exports of function-f in F. If the stew is a Laplace scatter with a parameter value of S(F)/ɛ, the method K meets S(F)/ɛ-differential-privacy; if the noise is a Laplace parameter value of the Laplace parameter, the algorithm satisfies S(F)) / ɛ - differential privacy.
It can be seen from Theorem 1 that the greater the Laplace sensitivity, the more Laplace noise that needs to be mixed in, and the stronger the differential privacy protection.
The release process of the Laplace mechanism differential privacy model is mainly divided into three parts: 1) deriving the word frequency matrix from the original data set, which is mainly used as the query middleware; 2) adding random noise that meets the Laplace scatter to the query middleware; 3) publishing the word frequency matrix carrying noise as the query middleware.
2. Differential privacy data release model
(1) Differential privacy during the micro accumulation
Differential-privacy presents the first mock exam feature. This model supplies more credible information protection than the extended models such as K anonymous models. Implementing k-anonymity through micro accumulation helps reduce the sensitivity of the input data. Therefore, data availability (from the perspective of reducing data distortion) can be increased without giving up the good privacy guarantees of differential privacy. In this way, the amount of K anonymous packets in the released datasheet will affect the sensitivity. The larger the group size k is, the less susceptible the centroid like the center of mass-produced by micro aggregation becomes. Also, the smaller the size of the dataset, the smaller the number of different-centroids in the micro aggregate dataset. Therefore, as the packet size increases or the data set decreases, the sensitivity decreases, and the noise added to achieve differential privacy is smaller, and the generated differential privacy data has higher utility. Therefore, it is considered to introduce differential privacy into the micro accumulation-based data release privacy protection model to reduce the noise required to meet the differential privacy and achieve noise reduction, which is more flexible and practical [35].
(2) Differential privacy wearable data releasing privacy information protection method
The differential privacy wearable data publishing privacy information protection method is shown in Table 1.
Privacy information protection algorithm for wearable data releasing on account of differential privacy
The differential-privacy process is to increase stew to every record in the grouped data-set
Data source
The data set used in this experiment is the Statlog dataset and the Adult dataset from the UCI Machine-Learning Repository. The Starlog dataset mainly has some information on heart disease people. The Starlog dataset mainly writes down some data of heart disease people. The attributes include ordinal, subtype, numeric, and binary data. Four quasi recognition-attributes, gender, age, maximum heart rate, resting blood pressure. A susceptible attribute: the type of chest pain. The Adult dataset is popularly used to imitate data releasing privacy information models or algorithms, taking four of the quasi-identifier attribute, namely age, sex, and years of education, weekly working hours, and a susceptible attribute: marital-status.
Experimental environment
The research on wearable-device data privacy protection technology put forward in this article is planned and achieved on a conventional PC.
Hardware-configuration of the equipment used in the experiment:
Central Processing Unit: Intel(R) Celeron CPU 1005M @1.90 GHz
RAM: 4G
Software-configuration of the equipment used in the experiment:
System: 32-bit Windows 10 operating system
Coding environment: Business mathematics software MATLAB 2016A, which is produced by American company Math Works.
Results
The parameters applied to measure the personal privacy information protection performance of the method in the micro accumulation algorithm usually have information content loss ratio and privacy-leakage hazard.
Result 1: Comparison of information loss
The Starlog set makes the general of the weights put up by the four attributes as a parametric variable, ie ωQ1u= 0.3, ωQ2u= 0.2, ωQ3u= 0.2, ωQ4u= 0.5; the same applies to the Adult data set. Simulation experiments were carried out on the conventional-V-MDAV method and the model of the WSV-MDAV algorithm based on weight-W and sensitivity-S that introduced differential privacy. The loss of the two data information sets when K is divided into different values, as shown in Figs. 1 and 2. The ɛ of the new WSV-MDAV method based on the weight-W and the sensitivity-S introducing differential-privacy is set to 0.01.

The contrast of information content loss of Starlog datasets under different k values.
To measure the privacy-protection of the method, the RL (Record-Linked) parameters that the processed data can link to the raw data sheet are simulated. For the data release model put forward here, the values of ɛ for differential-privacy in the Starlog dataset are set to 0.01, 0.1, 1 respectively; the values of ɛ for differential-privacy in the Adult dataset are set to 0.1, 1, and 6, respectively. On these two data sets, the algorithm of this paper is paralleled with the conventional-V-MDAV method under different ɛ values. The contrast of the hazard of privacy-leakage is shown in Figs. 3 and 4.

The contrast of information content loss of Adult datasets under different k values.

The contrast of privacy hazard of stat log datasets with different k values.

The contrast of privacy hazard of adult datasets under different k values.
In both Figs. 1 and 2, whether it is the conventional-V-MDAV method or the new WSV-MDAV micro accumulation method on account of the weight-W and the sensitivity-S, as the k’s value add, the information-loss (IL) increases, that is to say, the validity of the data is declining. If the value of k is larger, the equivalent class will be larger, and the homogeneity in the equivalent class will be at a lower level, so the information loss (IL) will be larger. In the case of the identical k value, it can be seen from Fig. 1 that when the value of k is less than 5, the IL of the conventional V-MDAV method is relatively low. After the value of k is less than 5, if the k value is larger, much information loss of the conventional V-MDAV algorithm will become more and more obvious. The WSV-MDAV micro accumulation method on account of weight-W and sensitivity-S put forward in this paper does not show a significant increase in IL but inclines to be much more stable. In Fig. 2, when the k value is large, the information-loss of the WSV-MDAV method put forward in this paper is also significantly lower. Therefore, it can be known that compared with the conventional-V-MDAV method, the WSV-MDAV method on account of weight-W and sensitivity-S put forward in this article has less IL and higher data validity. Much more, in the case of large value k, it has good performance and tends to be stable.
In both Figs. 3 and 4, whether the conventional-V-MDAV method or the WSV-MDAV micro aggregation method introducing differential privacy, as the value of k increases, the recording probability RL linked to the original datasheet will decrease, that is, privacy leakage risk is reduced and the model’s privacy protection is increased. In line with the principle of micro aggregation and k-anonymity, k is the minimum amount of records in every equivalence-class in the micro accumulation operating. In the equivalence class, the attacker can recognize that the probability of the record is not greater than 1/k. Therefore, if the value of K gets larger, the equivalence-class will make an identical change. So the protection effect of the method is very good. Meanwhile, the sensitivity put up in this article restricts the scatter of susceptible attributes in every equivalent-class, decreases the possibility of susceptible-attribute leakage, and further improves the personal information protection ability of this model. It can also be found that for the model introduced in this paper, the same k value, ɛ is smaller, the recording probability RL related to the raw data sheet is smaller, that is, the lower the hazard of privacy-leakage, the higher the personal information protection capability of the method. Therefore, it can be concluded that compared with the conventional-V-MDAV algorithm, the new WSV-MDAV micro accumulation algorithm on account of weight-W and sensitivity-S and the ɛ-differential-privacy equipment data release method put up in this article, when the k value is the same. When the parametric variable ɛ of differential privacy is smaller, the level of personal information protection ability of the method is higher. As the value of k improves, the personal information protection of the model will also increase.
Summary
The development of the information industry has brought convenience to the office and life of each of us, and it has also created hidden dangers of user data leakage. The data publishing process becomes less secure and may result in user privacy leaks. The traditional privacy protection method can’t meet the privacy protection requirements in the current environment. The new differential privacy model is widely used in the data release process with privacy protection requirements.
Footnotes
Acknowledgments
This work was supported by Guangdong province ordinary university characteristic platform project (2019GKTSCX069).
