A secure data fitting scheme based on CKKS homomorphic encryption for medical IoT

Abstract

With the development of big data technology, medical data has become increasingly important. It not only contains personal privacy information, but also involves medical security issues. This paper proposes a secure data fitting scheme based on CKKS (Cheon-Kim-Kim-Song) homomorphic encryption algorithm for medical IoT. The scheme encrypts the KGGLE-HDP (Heart Disease Prediction) dataset through CKKS homomorphic encryption, calculates the data’s weight and deviation. By using the gradient descent method, it calculates the weight and bias of the data. The experimental results show that under the KAGGLE-HDP dataset,we select the threshold value is 0.7 and the parameter setting is (Poly_modulus_degree, Coeff_mod_bit_sizes, Scale) = (16384; 43, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 43; 23), the number of iteration is 3 and the recognition accuracy of this scheme can achieve 96.7%. The scheme shows that it has a high recognition accuracy and better privacy protection than other data fitting schemes.

Keywords

Cloud computing data fitting homomorphic encryption gradient descent method

1. Introduction

Medical data is extremely valuable. For patients, personal medical data is an important foundation for individuals’ physical and mental health development. For hospitals and other medical institutions, it is also a significant reference for the development of medical technology.

With the rapid advancement of the Internet, big data, and cloud computing technology [2,11,28]. The medical indurstry is increasingly informatizationlized, and digital medical treatment is becoming more generalization. For example, Yu et al. [32] proposed a telemedicine method using deep learning technology in 2021. From July and September in 2019, Green Bone Networks analyzed approximately 2,300 online medical image archiving and communication systems (PACS) worldwide, among which about 590 are publicly available on the Internet, which includes about 24 million patients records. What worse, the data leaked from the dark Web could be worth more than $1 billion, it may be used by attackers for a variety of purposes, including damaging individual’s privacy by exposing personal names and images.

Linear regression model [7] is one of the most widely used models in machine learning, which makes predictions using linear combinations of sample features. In linear regression, the model is built by the linear prediction function, of which the parameters are estimated using the samples. When the sample data has several sample characteristics, the prediction function cannot be described by one linear function alone; Insteadly, a complex curve is used to depict the trend of the samples. Unlike linear regression, curve fitting involves using a continuous curve to approximate samples by resolving the relationship between variables to infinitely approximate or fit the known data. Curve fitting allows people observe trends in complex data and predict data outcomes for greater benefit [13]. The most classical method of curve fitting is to determine the parameters of the fitted curve by building and solving a system of equations using the least-square error method, which results in a specifically fitted curve equation. A system of nonlinear equations, also known as least-squares fitting, is used to find the parameters of the fitted curve for nonlinear models. On the other hand, the polynomial fit is a more widely used curve fitting method in which a polynomial is extended to include all sample data points in the analysis area, and its coefficients are determined using a least-squares fit [15].

With the development of big data technology in recent years, more and more users are turning to cloud servers to process data [17,27]. The powerful computing power of the cloud server can help users do a huge number of computing operations that cannot be completed by a local server, which considerably reducing the computing burden of users. As a distributed technology, cloud computing can utilize computing resources across the country, process data in the cloud, and return the computing results to users. On the cloud server, the user can allow the cloud server to process data according to the users’ preferences by designing the computing mode in advance [5,8,16,26].

Fitting data allows people to discover the pattern underlying the data. However, in reality, users do not want others to gain access to sensitive data, such as patient health data, confidential business development data, and so on. With the booming of data, local computing servers are becoming increasingly unable to process them, and more and more users are outsourcing data to cloud servers. Whereas, this always has some restrictions due to inherent security issues. Therefore, many researches began to focus on how to securely process the outsourced data on cloud servers in recent years [14,29].

There are many kinds of network security technologies, such as artificial intelligence based secruity technology [24,33], blockchain technology [25,34] and threat detection based on deep learning [10]. At present, homomorphic encryption technology is more popular. Homomorphic encryption [19] is a type of encryption technology that can decrypt data safely. Homomorphism means that after decryption the result of ciphertext data is the same as the operation performed on the plaintext data. Even if a third party acquires ciphertext data, it will be unable to obtain the plaintext information. The well-known encryption schemes include fully homomorphic encryption scheme BGV (Brakerski-Gentry-Vaikuntanathan) [4], BFV (Brakerski/Fan-Vercauteren) [9], CKKS [6], semi-homomorphic encryption scheme Paillier [20,21], RSA [22,23] and so on. Holomorphic encryption can support addition and multiplication, while semi-homomorphic encryption can only support addition or multiplication. In a finite number of operations, holomorphic encryption can perform more sophisticated calculations. This paper uses CKKS homomorphic encryption algorithm to ensure data security.

To tackle the privacy security problem of cloud data and realize efficient fitting data on the cloud server, this paper designs a fitting data scheme based on ciphertext data. The contributions of this paper are as follows:

The article uses the CKKS homomorphic encryption scheme to encrypt the data, apply the gradient descent method on ciphertext data to train the model. In order to overcome the limitation of homomorphic operation of ciphertext data, the article replacea the inverse operation in the solution matrix with an iterative algorithm to find out the weights and deviations. The CKKS encryption algorithm is implemented by employing the TenSEAL library, and the optimum parameters and iterative parameters are selected through extensive testing to efficiently and securely perform the fitting scheme under ciphertext data.

The first and second sections of the article mainly introduce the background and significance of the research. The third section introduces the theoretical knowledge coverd in the article. The fourth section introduces the scheme model and the fifth section tests the scheme experimentally. The sixth section summarizes the whole article.

2. Related work

The curve fitting is an important study direction in the machine learning. A lot of progress has been achieved in the research of fitting data in recent years. With the advent of big data era, data collection become relatively easy.

There are many types of curves fitting, such as linear fitting, quadratic polynomial fitting, cubic polynomial fitting, exponential fitting, Gaussian fitting, Bayesian curve fitting, etc.

In 2019, Xie [31] used data fitting to model and analyze data from Liberia. The extended epidemiological model (SEIHFR) used data combination and scenario analysis to better understand virus transmission to develop strategies that may lead to disease-free status.

In 2020, Windarto et al. [30] proposed a parameter estimation method for the dengue heat transfer model based on particle swarm optimization. This approach estimated the parameters of host vector and the Sir dengue transmission model using the data from dengue patients. By streamlining the parameters, this method received better data fitting results than that of the SIR model.

In 2020, M. Kowsher et al. [12] proposed two new linear and nonlinear regression techniques. In 2021, Woldegerima et al. [3] extended the SeIRS-SEI type model using data fitting. They did a lot of clinical advances in the development of antimalarial medications and have a preventive effect on treating disease.

At the same time, the curve fitting has a wide range of applications and it can also be used for face recognition to characterise facial organs with good adaptability. In contrast to the above-mentioned articles, this article proposes a secure data fitting scheme based on the CKKS homomorphic encryption algorithm. The scheme achieves data fitting in the ciphertext state by encrypting the data and calculates the coefficients of the fitting polynomial using a recursive descent method, which is also able to ensure data privacy. In the scheme proposed in this paper, the threshold method is used to supervise the data fitting process. In this paper, the maximum benefit of the scheme is systematically predicted by fixing the critical value, and the threshold method can also be used in other scenarios. For example, in data analysis, the data is classified by determining the threshold, and in image segmentation, the image pixels are classified by using the threshold. In this scheme, the threshold is used to make maximum accuracy measurements of ciphertext data. While ensuring data privacy, it can also measure the data efficiently. In the future work, threshold can be used not only for plaintext data, but also for ciphertext data. With the maturity of homomorphic encryption technology, threshold can also be applied in more fields.

3. Preliminaries

3.1. Least square method

Given n data points $(x_{1}, y_{1})$ , $(x_{2}, y_{2}) \dots (x_{n}, y_{n})$ . The most important is to obtain the approximate curve is set as $y = ϕ (x)$ . The methods of determining curve fitting are MSE and RMSE. Suppose the fitting polynomial is $y = ω_{0} + ω_{1} x + ω_{2} x^{2} + ω_{3} x^{3} + \dots + ω_{m} x^{m}$ . The matrix is shown in Eq. (1): $\begin{matrix} (1) & [\begin{matrix} 1 & x_{1} & {x_{1}}^{2} & \dots & {x_{1}}^{m} \\ 1 & x_{2} & {x_{2}}^{2} & \dots & {x_{2}}^{m} \\ 1 & x_{3} & {x_{3}}^{2} & \dots & {x_{3}}^{m} \\ ⋮ & ⋮ & ⋮ & ⋮ \\ 1 & x_{n} & {x_{n}}^{2} & \dots & {x_{n}}^{m} \end{matrix}] ∙ [\begin{matrix} ω_{0} \\ ω_{1} \\ ω_{2} \\ ⋮ \\ ω_{m} \end{matrix}] = [\begin{matrix} y_{1} \\ y_{2} \\ y_{3} \\ ⋮ \\ y_{n} \end{matrix}] \end{matrix}$ This equation can be summarized as $X * A = Y$ where $A = {[\begin{matrix} ω_{0} & ω_{1} & ω_{2} & \dots & ω_{m} \end{matrix}]}^{T}$ . And we can get the coefficient matrix as follows: $A = X^{- 1} ∙ Y = \frac{X^{*} ∙ Y}{| X |}$ .

3.2. CKKS homomorphic encryption algorithm

In 2017, Cheon et al. [12] proposed a method to construct a homomorphic encryption scheme supporting approximate addition and multiplication for encrypted messages. The specific algorithm flow is shown in Fig. 1.

Fig. 1.

CKKS algorithm model.

Chosen a base $p > 0$ and modulus q. Let $q_{l} = p^{l} * q$ for $0 < l < L$ where L is multiplicative depth. Given a security parameter λ and choose a parameter $M = M (λ, q_{L})$ for cyclotomic polynomial. The HE scheme consists of five algorithms $(Encode, KeyGen, Enc, Dec, Decode, Add, Mult)$ and over the polynomial ring $R = Z [X] / (Φ_{M} (x))$ . For polynomial ring $R_{k}^{q_{L}}$ , set $k = 2$ and randomly generate parameter $(N, χ_{key}, χ_{err}, χ_{enc}, L, q_{l})$ .

$Encode$ : The first step is encoding a vector of real numbers into a plaintext polynomial. Select message $m \in C^{N / 2}$ where m is a complex vector of N/2 dimensions. Encode it as a plaintext polynomial and get plaintext $p (X) \in R = Z [X] / X^{N + 1}$ .

$KeyGen$ : Randomly generata $s ⟵ χ_{key}$ and $e ⟵ χ_{err}$ . Set $a ⟵ R_{q_{L}}$ , $b ⟵ - a s + e (mod q_{L})$ . Set the secret key $s k ⟵ (1, s)$ and the public key $p k ⟵ (b, a) \in R_{q_{L}}^{2}$ . Randomly generate $a_{0} ⟵ R_{q_{L}}^{2}$ , $e_{0} ⟵ χ_{err}$ , $b_{0} = - a_{0} s + e_{0} (mod q_{L})$ and set the evaluation key $e v k ⟵ (b_{0}, a_{0}) \in R_{P * q_{L}}^{2}$ .

$Encrypt$ : Sample $v ⟵ χ_{enc}$ and $e_{0}, e_{1} ⟵ χ_{err}$ . The ciphertext is shown in Eq. (2): $\begin{matrix} (2) & c ⟵ v * p k + (m + e_{0}, e_{1}) (mod q_{L}) \end{matrix}$

$Decrypt$ : For ciphertext $c = (b, a)$ output with Eq. (3): $\begin{matrix} (3) & m = b + a * s (mod q_{l}) \end{matrix}$

$Decode$ : Decoding requires the opposite of encoding. When the ciphertext is decrypted, the plaintext is obtained as $p^{'} = f (p) \in R = Z [X] / X^{N + 1}$ . Decode the plaintext to get the message $m^{'} = f (m) \in C^{N / 2}$ .

$Add (c_{1}, c_{2})$ : For $c_{1}, c_{2} \in R_{q_{L}}^{2}$ output $c_{add} ⟵ c_{1} + c_{2} (mod q_{l})$ .

$Mult (c_{1}, c_{2}) : c_{1} = (b_{1}, a_{1})$ , $c_{2} = (b_{2}, a_{2}) \in R_{q_{L}}^{2}$ , $c_{1} \cdot c_{2} = (b_{1}, a_{1}) \cdot (b_{2}, a_{2}) = (b_{1} b_{2}, b_{1} a_{2} + a_{1} b_{2}, a_{1} a_{2}) (mod q_{l})$ . Output with Eq. (4): $\begin{matrix} (4) & Mult (c_{1}, c_{2}) = (b_{1} b_{2}, b_{1} a_{2} + a_{1} b_{2}) + ⌊ P^{- 1} \cdot a_{1} a_{2} \cdot e v k ⌉ (mod q_{l}) \end{matrix}$

The CKKS homomorphic encryption algorithm is based on the RLWE problem. All its operations are implemented in the polynomial ring

R = Z q [X] / (X^{N + 1})

. During the key generation phase, the a, s, e are sampled on the polynomial ring. The parameter a is uniform sampling, parameter s is a secret polynomial, and parameter e is a noise polynomial. The size of the public key

p k = (- a * s + e, a)

is not quadratic, but linear. All operations are completed in polynomial ring

R = Z q [X] / (X^{N + 1})

, and the computational complexity of all private keys and public keys are

O (n)

. The multiplication is implementedd in the polynomial ring, and its computational complexity is

O (n log (n))

. Because the multiplication is to perform vector multiplication between matrices, the computational complexity is not

O (n^{2})

Due to its own characteristics, homomorphic algorithm can ensure that after decryption of ciphertext data in cloud server, the rusult is same as the operation performed on the plaintext data. This effectively protects the privacy of user’s data.

4. System model

This paper uses the KAGGLE-HDP dataset adjust by predict the patient’s heart disease data. In order to make sure the data security, the dataset is encrypted by the CKKS algorithm. Then, the gradient descent method is used to solve the weight and deviation bias. The efficiency and accuracy of the scheme are tested by extensive experimental tests with different iteration times and encryption parameters. The specific scheme is shown in Fig. 2.

Fig. 2.

Safety data fitting scheme model.

CSU: Cloud server user, generates the encrypted public key and encrypts the data, uploads the ciphertext data to CSP for calculation, decrypts the ciphertext result returned by CSP and performs the corresponding comparison.

CSP: Cloud computing server with powerful computing and storage capacity, which is assumed to be “honest and curious” in the system, is mainly responsible for computing and processing ciphertext data in the whole system, and returning the result to CSU.

CSS: Cloud storage server, which mainly stores ciphertext data, when CSU uploads ciphertext data, it stores the data in CSS, and after iteration, it stores the ciphertext data in CSP.

4.1. System framework

Fig. 3.

Framework of the proposed model.

As shown in Fig. 3, the specific process of the framework of the security data fitting scheme is as follows.

According to the framework of the scheme, the algorithm is as follows:

The CSU users generate public and private key pairs locally. Encrypt the uploaded original data, initialize the ciphertext data to get $(enc_x, enc_y)$ and upload the generated ciphertext data to the CSS.

After receiving the ciphertext data, CSP uses the gradient descent method to obtain the weight and bias of the model in ciphertext form. First of all, the forward propagation of $enc_x$ is carried out, where $enc_out = enc_x . dot (self . weight) + self . bias$ . Then substitute the sigmiod function to generate $enc_out = sigmoid . (enc_out)$ .

CSP backpropagates the ciphertext data, and the substitution value is $(enc_x, enc_out, enc_y)$ . The specific formula is as follows in Eq. (5), Eq. (6) and Eq. (7): $\begin{array}{l} (5) & out_min us_y = enc_out - enc_y \\ (6) & self ._delta_w + = enc_x * out_min us_y \\ (7) & self ._delta_b + = out_min us_y \end{array}$ So far, one propagation has been completed, and at least one forward propagation should be performed during each interation.

CSP updata the parameters. According to the Eq. (8): $\begin{matrix} (8) & ω = ω - α * △ f (x) \end{matrix}$ Updates the weights and bias. The specific formula is as follows in Eq. (9) and Eq. (10): $\begin{array}{l} (9) & self . weight - = self ._delta_w * (1 / self ._count) + self . weight * 0.05 \\ (10) & self . bias - = self ._delta_b * (1 / self ._count) \end{array}$

4.2. Security analysis

In the process of uploading, calculating and storing ciphertext data by CSU, two points must be guaranteed for the data security: First, it is necessary to ensure that CSP and CSS cannot recover relevant information related to the original data from the ciphertext data; The second is to ensure that even if the third party obtains the ciphertext data, the original data cannot be obtained by decrypted ciphertext data.

In the scheme, we assume that the cloud server is “honest and curious”. Honesty means that participants do not falsify data and the server does not maliciously attack, decipher or reverse engineer the data uploaded by participants. Curiosity is that the server side has some degree of curiosity about the user’s raw data and may bypass some security measures to access the user’s raw data directly. The keys are only generated and stored by individuals, while CSP and CSS are only responsible for computing and storing ciphertext data. The CSP and CSU cannot obtain the keys and original data.

It is assumed that the ciphertext data is secretly obtained by the third party. During the whole communication process, the data is transmitted in the form of ciphertext, and its security can be attributed to the security of the CKKS encryption scheme. The security of CKKS encryption scheme relies on the RLWE(Ring Learning with Error) problem. The data calculated and stored by CSP and CSS are all ciphertext data, and the key information cannot be obtained. Therefore, the key and original data cannot be recovered by calculation.

We define the safe model as shown in the Fig. 4.

Fig. 4.

IND-CPA safe model.

We define the CSU as the challenger, possess the $p k$ , $s k$ , and send the $p k$ to the attacker.

The attacker selects two equal-length plaintexts $M 1$ , $M 2$ and send it to the challenger.

The challenger obtains the plaintext, it randomly selects the value of b, where $b \in (0, 1)$ . Then encrypt $M b$ and send it to the attacker.

If the attacker can give the value of b, it can determine whether the ciphertext is encrypt with $M 1$ or $M 2$ .

The advantage of the attacker to obtain b is $Advanced = 1 / 2 + ε$ . Because the CKKS homomorphic encryption algorithm is based on the RLWE problem. It is difficult to recover the plaintext without knowing the $s k$ . Therefore, the value of ε is negligible and the encryption algorithm is safety under IND-CPA model.

5. Experimental test

To further analyze the efficiency of the scheme, this section conducts experimental evaluations. The whole experiment was conducted at Intel^® Core™ under the environment of i7-9750hq CPU @ 2.60ghz/8 GB ram, pycharm2020.3.3 x64 is used to test plaintext and ciphertext data under windows10 operating system. CKKS encryption scheme is realized by calling TenSEAL0.1.4, KAGGLE-HDP dataset [1,18] is called to evaluate the data. Due to technical reasons, the article only uses a laptop to carry out the experimental simulation, and the results are limited by the computer configuration, but it can be used as an important reference for cloud computing. In the future work, we will carry out the experiments in a cloud server to get better data results. Also, we will conduct in-depth research and analysis.

5.1. Encryption efficiency test

This section mainly tests the encryption efficiency of CKKS encryption scheme under different encryption parameters. By calling the TenSEAL library, we use CKKS encryption scheme, and record the runtime of KAGGLE-HDP dataset under different encryption parameters. Poly_modulus_degree represents the size of the messages that can be encrypted and must be a power of 2. The larger the parameter, the greater the computational efficiency. Coeff_mod_bit_sizes represents the number of times that the ciphertext can be rescaling. The specific parameters are shown in Table 1.

Table 1
Encryption parameter setting

En_parameters Poly_modulus_degree Coeff_mod_bit_sizes Scale

1 8192 30, 217, 30 21

2 16384 43, 237, 43 23

3 16384 43, 239, 43 23

4 16384 43, 2311, 43 23

5 32768 25, 257, 25 25

6 32768 50, 257, 50 25

7 32768 25, 2511, 25 25

8 32768 50, 2513, 50 25

En_parameters	Poly_modulus_degree	Coeff_mod_bit_sizes	Scale
1	8192	30, 21*7, 30	21
2	16384	43, 23*7, 43	23
3	16384	43, 23*9, 43	23
4	16384	43, 23*11, 43	23
5	32768	25, 25*7, 25	25
6	32768	50, 25*7, 50	25
7	32768	25, 25*11, 25	25
8	32768	50, 25*13, 50	25

Efficiency impact of different scale values. With fixed parameters (poly_modulus_degree, coeff_mod_bit_sizes), the dataset is encrypted under different scale values, and the average value of 10 runtime is recorded as the final result of the experiment. In the test, let poly_modulus_degree = 8192, coeff_mod_bit_sizes = (30,21*7,30), select different scale values. As shown in Fig. 5.

Fig. 5.

Different scale values running time.

Second, we set the parameter poly_modulus_degree = 16384 to perform tests on the dataset under different coeff_mod_bit_sizes values. Record the average of 10 runtime as the final result of the experiment. As shown in Fig. 6.

Fig. 6.

Different coefficient modulus running time.

Then, we test the effect of poly_modulus_degree on the experiment. Select the coeff_mod_bit_sizes parameter as 7. Record the average value of 10 runtime as the final result of the experiment. As shown in Fig. 7.

Fig. 7.

Different polynomial modulus running time.

The encryption time of the dataset with different encryption parameters is tested first. Record the average encryption time of 10 experiments as the final result. Table 1 where the horizontal coordinates in the figure indicate the parameter settings in the table. The encryption time of the dataset is shown in Fig. 8.

Fig. 8.

Data set running time.

From the above figure, it can be seen that the encryption and decryption time of the dataset is independent of the scale value. With the same polynomial modulus, they increase with the number of coefficient modulus. With the same coefficient modulus, they increase with the polynomial modulus. These data results also lay the foundation for the next experimental operation.

5.2. Computational efficiency test of ciphertext domain

In this subsection, the efficiency of the CKKS encryption scheme is evaluated, because the CKKS encryption algorithm cannot implement the division operation, so here the gradient descent method is used to approximate the inverse matrix by iteration.

In this subsection, the number of iterations of the encryption scheme is set to 3, 5, and 7. Due to the computer configuration, if more iterations are chosen, the computational efficiency will be very expensive and extremely detrimental to the whole experiment. For the parameter configurations in the table, we compute 3, 5, and 7 iterations for each set of parameters. In the calculation accuracy, the experiment is carried out by setting the size of the threshold. We set the threshold to 0.5, 0.6 and 0.7 to observe the best parameter selection of the experiment.

Firstly, with the threshold is 0.5, we perform 3, 5, and 7 iterations of calculation on the eight sets of parameters to obtain the computational efficiency and accuracy. The encryption efficiency and iterationtime is shown in Fig. 9 and Fig. 10.

Fig. 9.

Encryption efficiency when the difference is 0.5.

Fig. 10.

Iteration time when the difference is 0.5.

From the above figures, we can see that the highest accuracy is 67% for the threshold 0.5, and the parameters are set as (16384; 43, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 43; 23). The iteration time increases with the increase of the ciphertext size and ciphertext extension space and the number of iterations.

In the presence of the threshold is 0.6, we perform 3, 5, and 7 iterations on each of the eight sets of parameters for obtain the computational efficiency and accuracy. The encryption efficiency and iterationtime is shown in Fig. 11 and Fig. 12.

Fig. 11.

Encryption efficiency when the difference is 0.6.

Fig. 12.

Iteration time when the difference is 0.6.

From the above figure, it can be seen that the highest accuracy of 85.6% is achieved at when the threshold is 0.6 and parameters (32768; 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25; 25) with 7 iterations, but the computational efficiency corresponding to this also consumes a lot of time. The highest accuracy of the comparison was achieved at the parameter of (16384; 43, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 43, 43; 23) with 3 iterations and 84.7% accuracy, so under the same conditions, we selected the latter set of parameters as the optimal parameters for this group of tests, which can take into account the efficiency while satisfying the accuracy.

Finally, in the case of the threshold is 0.7, we test eight groups of parameters to obtain the computational efficiency and accuracy. The encryption efficiency and iterationtime is shown in Fig. 13 and Fig. 14.

Fig. 13.

Encryption efficiency when the difference is 0.7.

Fig. 14.

Iteration time when the difference is 0.7.

From the above figure, it can be seen that the accuracy of calculation at 0.7 is highest at 96.7% with parameters (163843 43, 23, 23, 23, 23, 23, 23, 23, 23, 23, 43, 43; 23), and its iteration number is 3. Combining the graphs, it can be seen that the time of the experimental test does not change much when the difference is different, while the iteration time increases with the number of iterations and the change of parameters.

In the above experiments, we choose the experimental data with the threshold is 0.7, parameters set to (16384; 43, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 43; 23) and iterations of 3 as the final result, from which we get the weight and bias bias of the data set, and derive the mixture matrix of plaintext and ciphertext.

The accuracy of the direct least-squares solution on the plaintext data is 70.36%, the time spent is 0.013 s, and the resulting polynomial coefficients are as follows: $\begin{matrix} w = [[0.0570], [0.1030], [0.0496], [0.0312], [0.0374], [0.0364], [0.0726], [- 0.0097], [0.0421]] \end{matrix}$ On the ciphertext data processing, we use the third set of encryption parameters as the final result of the experiment, and the generated weights and deviations are as follows: $\begin{array}{l} weight = [[0.1899], [0.3498], [0.2067], [0.1210], [0.1566], [0.1803], [0.2976], [0.1554], [0.1433]] \\ bias = [- 0.1652] . \end{array}$ The mixed matrix generated by plaintext data and ciphertext data is shown in Fig. 15 and Fig. 16.

Fig. 15.

Plaintext data mixing matrix.

Fig. 16.

Ciphertext data mixing matrix.

Figure 15 and Fig. 16 represent confusion matrices generated after classification of plaintext data and ciphertext data. Specifically, the confusion matrix model is shown in Fig. 17. TP stands for true positive, FP stands for false positive, FN stands for false negative, TN stands for true negative. The evaluation index is that the more the proportion of TP and TN, the better classification result. Comparing Fig. 15 and Fig. 16, we can get the confusion matrix generated by plaintext data, its precision $PPV = TP / TP + FP = 98.5 %$ , and the precision of ciphertext data is 70%. Similarly, we can calculate accuracy $ACC = TP + TN / TP + FP + TN + FN$ , sensitivity $TRP = TP / TP + FN$ , specificity $TNR = TN / TN + FP$ .

Fig. 17.

Confusion matrix.

5.3. Future outlook

The scheme proposed in this paper can be applied in more fields. The data fitting method based on homomorphic encryption can be applied to the medical field, such as the scheme proposed in the paper to predict the probability of a patient’s disease. It can also be used in the economic field. For example, prices in the market will change continuously with the number of customers. We can fit the best polynomial function to express the specific relationship between prices and customer groups in different periods. Then we can adjust the price to ensure the increase in the number of customers, thereby creating economic benefits. We may use personal information of customers, such as age, work, etc., as data features for training. The homomorphic encryption scheme can also better protect the customer’s private information and prevent data loss or leakage.

We can also use it in driverless fields. We can find out the optimal driving route through multiple fitting training of the unmanned trajectory. We can complete the trajectory re-planning of the vehicle, and achieve the maximum accuracy in the prediction time domain by making a polynomial fitting curve for the vehicle position, acceleration, etc. The homomorphic encryption scheme is mainly responsible for encrypting and protecting various data of the vehicle, preventing it from being stolen by a third party, and completing the test of the data during the training process.

To sum up, there are more fields where the scheme proposed in this paper can be applied. In future work, we will also actively explore the feasibility of the scheme in other fields.

6. Conclusion

This paper proposes a data processing scheme based on the CKKS homomorphic encryption algorithm and the gradient descent method. We use the TenSEAL library to implement the CKKS homomorphic encryption algorithm, and test the encryption efficiency and computational efficiency under different parameters. Finally, we determine the optimal parameters through experiments and realize safe data fitting in the cloud server.

In the future work, we will study to reduce the computational overhead, improve the accuracy and securely implement the data fitting in the ciphertext domain. We also conduct in-depth research on the scalability of data to meet different demand changes and improve the overall performance of the program.

Footnotes

Acknowledgements

This work is supported by the National Key R&D Program of China (2021YFF1201100), Engineering University of PAP’s Funding for Scientific Research Innovation Team [No. KYTD201805] and Engineering University of PAP’s Funding for Key Researcher [No. KYGG202011]

Conflict of interest

There is no conflict of interest.

References

https://www.kaggle.com/dileep070/heart-disease-prediction-using-logistic-regression.KAGGLE-HDP.

Almorsy ,

Grundy and

A.S.

Ibrahim , Collaboration-based cloud computing security management framework, in: 2011 IEEE 4th International Conference on Cloud Computing, 2011, pp. 364–371. doi:10.1109/CLOUD.2011.9.

W.W.

Assefa ,

Rachid and

Jacek , Mathematical analysis of the impact of transmission-blocking drugs on the population dynamics of malaria, Applied Mathematics and Computation 400 (2021), 126005. doi:10.1016/j.amc.2021.126005.

Brakerski ,

Gentry and

Vaikuntanathan , (Leveled) fully homomorphic encryption without bootstrapping, in: Proceedings of the 3rd Innovations in Theoretical Computer Science Conference (ITCS ’12), Association for Computing Machinery, New York, NY, USA, 2012, pp. 309–325. doi:10.1145/2090236.2090262.

Buyya ,

C.S.

Yeo ,

Venugopal ,

Broberg ,and

Brandic , Cloud computing and emerging IT platforms: Vision, hype, and reality for delivering computing as the 5th utility, Future Generation Computer Systems 25(6) (2009), 599–616. doi:10.1016/j.future.2008.12.001.

J.H.

Cheon ,

Kim ,

Kim and

Song Homomorphic encryption for arithmetic of approximate numbers, in: International Conference on the Theory and Application of Cryptology and Information Security, Vol. 10624, Springer, Cham, 2017, pp. 409–437. doi:10.1007/978-3-319-70694-8-15.

A.C.

Davison and

D.V.

Hinkley , Bootstrap Methods and Their Application: Linear Regression, 1997. doi:10.1017/CBO9780511802843.007.

Deng ,

Li ,

Xiong and

Wu , POISIDD: Privacy-preserving outsourced image sharing scheme with illegal distributor detection in cloud computing, Multimedia Tools and Applications 81(3) (2022), 3693–3714. doi:10.1007/s11042-021-11737-8.

Fan and

Vercauteren , Somewhat practical fully homomorphic encryption, Iacr Cryptology Eprint Archive 2012 (2012), 144.

10.

Guo ,

Shen ,

A.K.

Bashir ,

Lmran ,

Kumar ,

Zhang and

Yu , Robust spammer detection using collaborative neural network in Internet-of-things applications, IEEE Internet of Things Journal 8(12) (2021), 9549–9558. doi:10.1109/JIOT.2020.3003802.

11.

H.J.

Kim ,

J.H.

Shin ,

Y.H.

Song and

J.W.

Chang , Privacy-preserving association rule mining algorithm for encrypted data in cloud computing, in: 2019 IEEE 12th International Conference on Cloud Computing (CLOUD), 2019, pp. 487–489. doi:10.1109/CLOUD.2019.00086.

12.

Kowsher ,

M.J.

Uddin ,

M.M.

Moheuddin and

T.Y.M.

Two , New regression and curve fitting techniques using numerical methods, SSRN Electronic Journal (2020). doi:10.2139/ssrn.3590089.

13.

G.P.

Lepage ,

Clark ,

C.T.H.

Davies ,

Hornbostel ,

P.B.

Mackenzie ,

Morningstar and

Trottier , Constrained curve fitting, Nuclear Physics B – Proceedings Supplements 106–107 (2002), 12–20. doi:10.1016/s0920-5632(01)01638-3.

14.

Li ,

Xia ,

Huang ,

Zhang and

Zhang , TRAC: Traceable and revocable access control scheme for mHealth in 5G-enabled IIoT, IEEE Transactions on Industrial Informatics 18(5) (2022), 3437–3448. doi:10.1109/TII.2021.3109090.

15.

H.H.

Madden , Comments on the Savitzky-Golay convolution method for least-squares-fit smoothing and differentiation of digital data, Analytical Chemistry 50(9) (1978), 1383–1386. doi:10.1021/ac50031a048.

16.

Marston ,

Zhi ,

Bandyopadhyay and

Ghalsasi , Cloud computing – the business perspective, Decision Support Systems 51(1) (2011), 176–189. doi:10.1016/j.dss.2010.12.006.

17.

Michael ,

Fox ,

Griffith ,

A.D.

Joseph ,

R.H.

Katz ,

Konwinski ,

Lee ,

D.A.

Patterson ,

Rabki and

Zaharia , Above the Clouds: A Berkeley View of Cloud Computing, Technical Report No. UCB/EECS-2009-28, University of California at Berkeley, 2009. doi:10.1145/1721654.1721672.

18.

Naresha ,

H.F.

Kareem and Dileep , Logistic regression to predict heart disease data sets, KAGGLE (2019).

19.

Ogburn ,

Turner and

Dahal , Homomorphic encryption, Procedia Computer Science 20 (2013), 502–509. doi:10.1016/j.procs.2013.09.310.

20.

Paillier , Public-key cryptosystems based on composite degree residuosity classes, in: Advances in Cryptology – EUROCRYPT ’99. EUROCRYPT 1999,

Stern , ed., Lecture Notes in Computer Science, Vol. 1592, Springer, Berlin, Heidelberg, 1999, pp. 223–238. doi:10.1007/3-540-48910-X_16.

21.

Paillier and

P.D.

Efficient , Public-key cryptosystems provably secure against active adversaries, in: International Conference on the Theory Applications of Cryptology Information Security: Advances in Cryptology, Vol. 1716, Springer-Verlag, 1999, pp. 165–179. doi:10.1007/978-3-540-48000-6-14.

22.

R.L.

Rivest ,

L.M.

Adleman and

M.L.

Dertouzos , On data banks and privacy homomorphisms, in: Foundations of Secure Compuation, 1978, pp. 167–179.

23.

R.L.

Rivest ,

Shamir and

Adleman , A method for obtaining digital signatures and public-key cryptosystems, Association for Computing Machinery 21(2) (1978), 120–126. doi:10.1145/359340.359342.

24.

Sun ,

Liu ,

Yu ,

Alazab and

Lin , PMRSS: Privacy-preserving medical record searching scheme for intelligent diagnosis in IoT Healthcare, IEEE Transactions on Industrial Informatics 18(3) (2022), 1981–1990. doi:10.1109/TII.2021.3070544.

25.

Tan ,

Yu ,

Shi ,

Yang ,

Wei and

Lu , Towards secure and privacy-preserving data sharing for Covid-19 medical records: A blockchain-empowered approach, IEEE Transactions on Network Science and Engineering 9(1) (2022), 271–281. doi:10.1109/TNSE.2021.3101842.

26.

Tian ,

Zhang ,

Xiong ,

Chen ,

Ma and

Peng , Achieving graph clustering privacy preservation based on structure entropy in social IoT, IEEE Internet of Things Journal 9(4) (2022), 2761–2777. doi:10.1109/JIOT.2021.3092185.

27.

M.A.

Vouk , Cloud computing – issues, research and implementations, in: ITI 2008–30th International Conference on Information Technology Interfaces, 2008, pp. 31–40. doi:10.1109/ITI.2008.4588381.

28.

Wan ,

Ding and

Chen , Edge computing enabled video segmentation for real-time traffic monitoring in Internet of vehicles, Pattern Recognition 121 (2022), 108146. doi:10.1016/j.patcog.2021.108146.

29.

Wang ,

L.T.

Yang ,

Xie ,

Jin and

M.J.

Deen , A cloud-edge computing framework for cyber-physical-social services, IEEE Communications Magazine 55(11) (2017), 80–85. doi:10.1109/MCOM.2017.1700360.

30.

Windarto ,

M.A.

Khan and Fatmawati , Parameter estimation and fractional derivatives of Dengue transmission model, AIMS Mathematics 5(3) (2020), 2758–2779. doi:10.3934/math.2020178.

31.

Xie , Data fitting and scenario analysis of vaccination in the 2014 Ebola outbreak in Liberia, Osong public health and research perspectives 10(3) (2019), 187–201. doi:10.24171/j.phrp.2019.10.3.10.

32.

Yu ,

Tan ,

Lin ,

Cheng ,

Yi and

Sato , Deep-learning-empowered breast cancer auxiliary diagnosis for 5GB remote E-health, IEEE Wireless Communications 28(3) (2021), 54–61. doi:10.1109/MWC.001.2000374.

33.

Yu ,

Tan ,

Mumtaz ,

Al-Rubaye ,

Al-Dulaimi ,

A.K.

Bashir and

F.A.

Khan , Securing critical infrastructures: Deep-learning-based threat detection in IIoT, IEEE Communications Magazine 59(10) (2021), 76–82. doi:10.1109/MCOM.101.2001126.

34.

Yu ,

Tan ,

Shang ,

Huang ,

Srivastava and

Chatterjee , Efficient and privacy-preserving medical research support platform against Covid-19: A blockchain-based approach, IEEE Consumer Electronics Magazine 10(2) (2021), 111–120. doi:10.1109/MCE.2020.3035520.

A secure data fitting scheme based on CKKS homomorphic encryption for medical IoT

Abstract

Keywords

1. Introduction

2. Related work

3. Preliminaries

3.1. Least square method

3.2. CKKS homomorphic encryption algorithm

5.1. Encryption efficiency test

Table 1 Encryption parameter setting En_parameters Poly_modulus_degree Coeff_mod_bit_sizes Scale 1 8192 30, 21*7, 30 21 2 16384 43, 23*7, 43 23 3 16384 43, 23*9, 43 23 4 16384 43, 23*11, 43 23 5 32768 25, 25*7, 25 25 6 32768 50, 25*7, 50 25 7 32768 25, 25*11, 25 25 8 32768 50, 25*13, 50 25

6. Conclusion

Footnotes

Acknowledgements

Conflict of interest

References

Table 1
Encryption parameter setting

En_parameters Poly_modulus_degree Coeff_mod_bit_sizes Scale

1 8192 30, 217, 30 21

2 16384 43, 237, 43 23

3 16384 43, 239, 43 23

4 16384 43, 2311, 43 23

5 32768 25, 257, 25 25

6 32768 50, 257, 50 25

7 32768 25, 2511, 25 25

8 32768 50, 2513, 50 25