Hadoop framework integrated hybrid optimization algorithm for privacy preserved clustering mechanism

Abstract

Big data analysis has gained immense attention throughout classical techniques, which connect in mining the hidden samples from huge data. To relieve computational complexity, the clustering technique is adapted as an imperative part. A novel model is devised for privacy preserved clustering of data with MapReduce framework. The aim is to devise an optimization technique for privacy preservation. The input data is acquired from various distributed sources. The data is further partitioned and fed to MapReduce framework, which consist of mapper and reducer. The mappers perform privacy preservation by encrypting the data with several functionalities, like encryption, Kronecker product and secret key. Here, the secret key generation is performed using proposed Chimp Grey Wolf Optimization (ChGWO) algorithm. The proposed ChGWO is developed by combining Chimp Optimization algorithm (ChOA), and Grey Wolf Optimizer (GWO). The fitness is newly developed considering utility and privacy. The privacy is Jaro Winkler similarity and utility is accuracy. Finally, the data clustering is carried out with the Deep Fuzzy Clustering (DFC). The proposed ChGWO offered enhanced efficiency with highest utility of 92.5%, highest privacy of 91.5% and highest random coefficient 65.9%.

Keywords

Privacy preservation deep fuzzy clustering MapReduce framework big data encryption

1. Introduction

The big data indicates huge quantity of unstructured and structured data with increasing 2.5 Exabyte’s each data. The quick augmentation in data quantity is because of mobile data, YouTube, web-services, health care data, digital cameras, Global positioning system (GPS) signals, and social media, such as twitter and Facebook. In general, the big data are classified into variety, volume and velocity. Here, the volume refers quantity of data bigger than petabytes or Terabytes. The velocity indicates the data speed shared and variety indicates detonation of novel data kinds from mobile computing, social sites and machine devices. The big data is utilized for bringing modifications in society and business termed as big data analytics [9, 10, 5]. The big data involves huge database means which is impossible for generally utilized software applications for managing and processing the data in requisite time instance [13]. The Hadoop is an open-source cloud infrastructure for Apache Foundation that offers software programming model known as MapReduce and Hadoop distributed file system called as HDFS. The major feature of Hadoop is splitting of data and count of hosts and implementing application in parallel close to the needed data. The major parameters of Hadoop are that it is superfluous and consistent that as if you lost a machine, because of some failure, then it automatically replicates the data without operator. It is strong in data access and initially batch processing centric and made it simple to distributed applications with MapReduce model [11, 12, 14, 17, 18].

The security and privacy of big data has acquired huge attention in the research communities because of promising techniques, such as social networks, Cloud Computing and analytics engines [21]. The security in big data has acquired severe issues and gained the focus towards privacy preservation techniques [28, 19]. The infringement of privacy throughout the data aggregation and communication even in the deficiency of confidential information regarding the individuals as advanced data mining methods are effective. Thus, the privacy problems undergo an unparalleled growth based on diversity and importance [20]. The security represents the exercises to defend data and its assets in terms of processes, technology and training from the illegal access, and Destruction. The security concentrates on preserving the data from attacks and averts incorrect utilization of stole data for profit [22]. The clustering is a process wherein the data are set as clusters on the basis of data point’s similarity. The data in cluster contains huge similarity amongst each other while the data amongst cluster acquires low similarity [15, 16, 7].

Clustering is a technique utilized for splitting the input into group of specific sets. The same objects are imperative possessions for clustering the data. In various applications, the clustering is utilized that involves forensics, user comfort ability assessment and bioinformatics. The novel research termed as clustering protection had raise to address the privacy issue when the sharing of data is performed. The privacy in clustering certifies precise clustering result [2]. The clustering can be defined by integrating similar instances of database into similar group based on specific data features. For evaluating the criterion of clustering technique, the attributes, which are same to each other must be in same group. There exist four clustering techniques, like connectivity-assisted, centroid-assisted, density-assisted and distribution-assisted techniques [3]. Clustering is an unsupervised learning method for grouping same data of parameters. The aim of clustering is to discover basic sets in group of unlabelled huge database. The data clustering can be utilized in several research areas, like spatial databases, mining data, recognition of pattern, medical domain, Market analysis and web statistics [6]. The design of Deep Neural Networks (DNNs) can be utilized for transforming the data into more clustering because of inherent non-linear transformation. For the ease of explanation, a clustering technique with deep model is utilized. As the deep clustering is to learn a clustering-based representation, which is not apposite for classifying techniques based on clustering loss [23].

The goalis to design a technique for data clustering through the generation of optimal coefficient using proposed Chimp Grey Wolf Optimization (ChGWO). The input big data are acquired from the dataset and the data is partitioned and fed to MapReduce model that comprises mapper and reducer. In mappers, the privacy preservation of data is developed using various functionalities, such as encryption, Kronecker product and secret key where the optimal coefficients are evaluatedwith proposed ChGWO algorithm that is devised by combining ChOA, and GWO. The optimization utilizes fitness factor as privacy and utility where accuracy is utility and Jaro Winkler similarity is privacy. The data processed at each mapper are fused by merging process and fed to reducer phase where data clustering mechanism is carried out using DFC.

The major contributions of the paper is:

•
Proposed ChGWO for privacy preservation: The proposed ChGWO is devised for data clustering through the generation of optimal coefficient. The ChGWO is obtained by integrating GWO and ChOA. The optimization utilizes fitness factor as privacy and utility where accuracy is utility and Jaro Winkler similarity is privacy.

The organization of paper are, Section 2 comprises classical privacy preservation techniques, Section 3 devises proposedChGWOfor generating optimal secret key for privacy preservation, Section 4 portrays effectiveness of devised model, and Section 5 presents conclusion.
2. Motivations

The acquisition of privacy preservation while sharing data from cluster is complicated process. To solve this problem, the owners of data should meet privacy needs and certify valid clustering outcomes. Thus, the issue and challenges of previous technique are considered as a motivation to develop a novel technique.

2.1 Literature survey

The eight conventional methods usingprivacy preserved clustering are illustrated with its advantages and issues. Banasode and Padamannavar [1] developed cryptographic technique for providing elevated security in big data with less time. Here, the decryption and encryption were considered using Indexed RSA (IRSA). In addition, the keywords were indexed prior to encryption. However, this technique did not perform security assessment with other attacks. To deal with other attacks, Bolla and Anandan [2] developed a Multi-Label Big Data Clustering with Privacy Protection Probability Linked Weight Optimization (MLBDC-PP-LWO) for privacy preserved data clustering. Here, the sensitive data was preserved by forming clusters. However, this technique did not utilize feature selection for increasing the privacy. To enhance privacy, Catak et al. [3] utilized homomorphic encryption for privacy preserved clustering using cloud model. The entities of system do not require high processing power as the power demanding process were done by cloud system provider. However, the method did not offer effective accuracy. To mitigate accuracy issues, Lekshmy and Abdul Rahiman [4] developed Privacy Preserving Distributed Data Mining. Here, a sanitization technique was utilized to enhance the privacy of user data. Moreover, a privacy-assisted fitness was devised by generating the optimum key. In addition, ABC technique was usedto encrypt huge data. However, if the loss of information is high, then it is complex for hackers to predict the data. To prevent information loss, Khan et al. [5] developed clustering-based privacy preservation probabilistic technique for dealing with big data in order to secure confidential data. The method attained minimal perturbation and high privacy. However, this technique caused overhead when there is huge data to process. Zou et al. [6] utilized k-means clustering with non-colluded Cloud Computing Service (CCS) for effective privacy preserved clustering of data. Here, the bulk copy program (bcp) encryption was utilized to encrypt the data records and the cryptosystem averts the CCS from generating beneficial data from ciphertexts. However, this technique did not adapt other security attributes. To involve security attributes, Kulkarni et al. [7] utilized MapReduce framework (MRF) and handled data arrived from various sources. Here, the mapper utilized Fractional Sparse Fuzzy C-Means (FrSparse FCM) for clustering and reducer utilized particle swarm optimisation-based whale optimisation algorithm (P-Whale) for optimum tuning. However, the method did not able to evaluate computation overhead. To deal with computational overhead issues, Kulkarni et al. [8] developed a method, namely FPWhale-MRF for clustering data with MapReduce. Here, the mapper utilized Fractional Tangential-Spherical Kernel clustering algorithm to evaluate cluster centroids. The reducer integrated the mapper output to discover the optimum centroids with P-Whale. However, the method did not consider all security factors.

2.2 Challenges

The problemsfaced by the previous privacy preservation strategies are enlisted.

•
Due to escalating attacks on re-discovery, the link from several locations of public data is not adequately secure to emulate the source subjects [2].
•
To secure data, information perturbation technique is commonly utilized by changing original information data, but it bought up several issues, like information misrepresentation and hence prompts to knowledge loss [4].
•
For dealing with information misrepresentation, a cryptographic technique [1] is utilized by offering elevated security in less time, but it failed to offer security assessment with various kinds of attacks.
•
To deal with several attacks, the FrSparse FCM is utilized. However, due to huge quantity of big data, there exist huge data objects and it needs large time for processing. In addition, the majority of classical techniques needs time to process and suffered from huge computational complication [7].
•
The major issue of clustering big data is that this technique is heterogeneous, large and vibrant as key are accumulated from several sources without benchmark format.

3. Proposed ChGWO for privacy preserved clustering

Figure 1.

Structure of privacy preserved clustering model using proposed ChGWO.

The data privacy is compromised throughout the process of data mining for extracting beneficial data. Due to huge security, the privacy became a major problem. The majority of privacy preserving techniques utilize data transformation for preserving the privacy of data, while controlling the data availability and became essential domain in information security. The goal is to devise a technique for data clustering by generating optimum coefficient with proposed ChGWO. At first, the inputted big data is attained from dataset, and the data is partitioned and subjected to MapReduce, which consist of mapper and reducer phase. In mappers, the privacy preservation of data is done by encrypting the data considering several functionalities, like encryption, Kronecker product and secret key where the optimal coefficients are computed using proposed ChGWO algorithm. Here, the proposed ChGWO is developed by combining ChOA [24], and GWO [25]. The fitness factor is newly devised considering accuracy and Jaro Winkler similarity. The obtained data is processed in each mapper and fused by merging process and subjected to reducer phase wherein the data clustering is performed using DFC. Figure 1 reveals structure of privacy preserved clustering model using proposed ChGWO.

Assume an input data is denoted as $E$ with various attributes, and is expressed as,

$\displaystyle E=\{E_{j,k}\};\quad({1\leqslant j\leqslant J})\;({1\leqslant k% \leqslant K})$ (1)

where, $E_{j,k}$ signifies $k^{\text{th}}$ attribute in $j^{\text{th}}$ data, $J$ refers total data points, and $K$ symbolize total attributes in each data.

3.1 Partitioning of input data

The data $E_{j,k}$ is divided to particular number, and is equal to total mappers contained in MapReduce. The splitted data is expressed as,

$\displaystyle E_{j,k}=\{{G_{r}}\};\quad 1\leqslant r\leqslant s$ (2)

where, $s$ signifies total mappers.

3.2 MapReduce framework for privacy preserved data clustering

For reducing the time of computation and for controlling the dispersed data, this model used the MapReduce model wherein the computational problems are reduced by effectual secret key generation. The technique attained enhanced accuracy of classification by effective parallelism of server, which processes the big data partitions in parallel. The proposed model considers two steps for effective privacy preservation in which one is privacy preservation in mapper and second is clustering using proposed ChGWO. Here, the privacy preserved data clustering is carried out in MapReduce using proposed ChGWO. Here, the privacy preservation is done in mappers considering proposed ChGWO and the clustering is carried out in reducer using DFC.

3.2.1 Privacy preservation in mapper

Consider $s$ mappers in MapReduce be expressed as,

$\displaystyle L=\{{L_{1},L_{2},\ldots,L_{r},\ldots,L_{s}}\};\,1\leqslant r\leqslant s$ (3)

Thus, input to $r^{\text{th}}$ mapper is expressed as,

$\displaystyle G_{r}=\{{d_{o,l}}\};\,1\leqslant o\leqslant m_{r};\,1\leqslant l\leqslant n$ (4)

where, $d_{o,l}$ signifies splitted data provided to $r^{\text{th}}$ mapper to process, and $m_{r}$ express data in $r^{\text{th}}$ mapper, and $n$ is attributes in data.

Assume $d_{o,l}$ be partitioned data subjected to the mapper and is given by,

$\displaystyle d_{(u\times v)};\quad 1\leqslant o\leqslant u,\quad 1\leqslant l\leqslant v$ (5)

Each mapper acquires the data and process the data, which is given as,

$\displaystyle Q_{(u\times v)}=d\ast H$ (6)

where, $H$ is Chebyshev polynomial equation, which is given by,

$\displaystyle H=8x^{4}-8x^{2}+1$ (7)

Here, the variable $x$ is formulated as,

$\displaystyle x=\sum_{o=1}^{u}{\sum_{l=1}^{v}{d_{o,l}}}$ (8)

The privacy preserved data is given by,

$\displaystyle R_{(u\times v)}=E({d,t_{i}})$ (9)

where, $E(.)$ represents encryption and $t_{i}$ signifies optimal key generated using proposed ChGWO.

Here, the correlation between feature and class label is given by,

$\displaystyle S_{(u\times 1)}=\min\textit{corr}(g,h)$ (10)

where, $\textit{corr}(.)$ express correlation between features $g$ and class label $h$ .

The matrix to store the correlation values is given as,

$\displaystyle M_{(u\times u)}=S\times S^{T}$ (11)

where, $\times$ represent matrix multiplication.

Here, the bilateral matrix is given by,

$\displaystyle F_{(u\times v)}=M\times R$ (12)

The Kronecker matrix is expressed as,

$\displaystyle B_{({u+u})\times v}=F_{(u\times v)}\otimes S_{(u\times 1)}$ (13)

where, $\otimes$ signifies Kronecker product.

The bilinear matrix is given as,

$\displaystyle N_{(u\times u)\times v}=Q_{(u\times v)}\otimes S_{(u\times 1)}$ (14)

The privacy protected data is given as,

$\displaystyle I_{(u\times v)}=d_{(u\times v)}*(N_{v\times(u*u)(v\times v)}^{T}% \times B_{(u*u)\times v})$ (15)

The retrieval key $T$ is expressed by,

$\displaystyle T_{v\times(v+1)}=({N^{T}\times B})\|x$ (16)

The retrieved data is represented as,

$\displaystyle d^{\ast}=\frac{I^{\ast}}{T_{(v\times v)}}$ (17)

where, $T$ represents retrieval key without concatenation value.

a) Solution encoding

The modelling of solution is done to define the solution of proposed ChGWOalgorithm. The solution representation helped in generating the optimal key. The solution vector consists of solutions with dimension 1 $\times\,q$ where size of the key is $q$ . Figure 2 represents the solution vector of proposed ChGWO algorithm.

Figure 2.

Solution vector of proposed ChGWO algorithm.

b) Fitness function

The fitness of proposed ChGWOis usedto discoveroptimum key. Thus, it is imperative to consider the attributes related to privacy protection on fitness function. The two imperative factors that generates optimal key are accuracy and Jaro Winkler distance. The end users who request data mustattain the data with highestfitness. Thus, the combination of accuracy and Jaro Winkler distance is described in fitness, and is expressed as,

$\displaystyle\textit{Fitness}=\frac{A+C}{2}$ (18)

where, $A$ signifies Utility and $C$ symbolize privacy.

Here, the utility is accuracy that refers the nearness degree of computed value to original value, and is represented as,

$\displaystyle A=\frac{Y^{p}+Z^{n}}{Y^{p}+Y^{n}+Z^{p}+Z^{n}}$ (19)

where, $Y^{p}$ signifies true positive, $Y^{n}$ symbolize true negative, $Z^{p}$ refers false positive, and $Z^{n}$ denote false negative.

It is very beneficial in estimating the distance amongst two strings. The higher the value of Jaro Winkler distance, the more similar the strings are. It is useful to make effective decisions. The privacy is computed based on JaroWinkler distance wherein the distance is computed amongst original data $d$ and privacy protected data $I$ , and given as,

$\displaystyle C=J_{l}+M\alpha(1-J_{k})$ (20)

where, $M$ represents length of prefix, $\alpha$ refers constant, and $J_{l}$ is Jaro similarity and is expressed as,

$\displaystyle J_{l}=\left\{{\begin{array}[]{l}0\\ {\displaystyle\frac{1}{3}}\left({{\displaystyle\frac{\rho}{|d|}}+{% \displaystyle\frac{\rho}{|I|}}+{\displaystyle\frac{\rho-l}{\rho}}}\right)\\ \end{array}}\right.$ (21)

where, $\rho$ indicate number of matching characters, $l$ is the number of transpositions and $d$ is input data and $I$ represent privacy protected data.

c) Generation of secret key with developed ChGWO

The secret key is produced using developedChGWO algorithm. Here, a GWO [25]is motivated from the grey wolves and imitate the quality of leadership. In GWO, four kinds of grey wolves are employed, namely alpha, beta, delta, and omega, which are adapted to provide the quality of leadership. Moreover, three major steps are adapted namely hunting; searching, attacking and encircling prey are executed. Meanwhile, ChOA [24] is inspired from the intellect of each individual and sexual inspiration among the chimps in offering group hunting. It is developed for preventing slow convergence speed by solving high-dimensional issues, such as learning algorithm. It assists ChOA to improve the stochastic behavior and process of optimization and reduced chance to trap in local minima. It effectually balances exploration and exploitation phases. The combination of ChOA and GWO assists to generate the global optimum solutions. The steps of proposed ChGWO is expressed as,

Step 1) Initialization

The foremost step is initialization of solution, and given by, $D$ with total $f$ solution, where $1\leqslant e\leqslant f$ :

$\displaystyle D=\{D_{1},D_{2},\ldots,D_{e},\ldots,D_{f}\}$ (22)

where, $f$ signifies total solution, and $D_{e}$ express $e^{\text{th}}$ solution.

Step 2) Determination of fitness

The fitness of each solution is determined using Eq. (18), and is already explained in Section 3.2.1b).

Step 3) Driving and chasing prey

According to ChOA [24], the prey hunting is performed with exploration and exploitation stages. For modelling the driving and chasing prey, the equation is modelled and the distance is given as,

$\displaystyle X=\left|{L\cdot D_{\textit{prey}}(y)-n\cdot D_{\textit{chimp}}(y% )}\right|$ (23)

where, $y$ signifies current iteration, $L$ is coefficient vectors, $n$ represent chaotic value, $D_{\textit{chimp}}(y)$ symbolize current position of chimp, and $D_{\textit{prey}}(y)$ refers current position of prey.

The chasing prey is formulated as,

$\displaystyle D_{\textit{chimp}}(y+1)=D_{\textit{prey}}(y)-W\cdot X$ (24)

where, $W$ represents coefficient vectors, and $X$ symbolize distance. Here, the coefficient vector is expressed as,

$\displaystyle W=2K\cdot z_{1}-K$ (25)

where, $K$ is linearly reduced from 2.5 to 0, and $z_{1}$ represents arbitrary number.

The coefficient vector is given by,

$\displaystyle L=2\cdot z_{2}$ (26)

where, $z_{2}$ signifies random number.

$\displaystyle D_{\textit{chimp}}(y+1)=D_{\textit{prey}}(y)-W$ (27) $\displaystyle\qquad\cdot\,\left|{L\cdot D_{\textit{prey}}(y)-n\cdot D_{\textit% {chimp}}(y)}\right|$

Assume $D_{\textit{prey}}(y)>D_{\textit{chimp}}(y)$ ,

$\displaystyle D_{\textit{chimp}}(y+1)=D_{\textit{prey}}(y)-W$ (28) $\displaystyle\qquad\cdot\,\left({L\cdot D_{\textit{prey}}(y)-n\cdot D_{\textit% {chimp}}(y)}\right)$ $\displaystyle D_{\textit{chimp}}(y+1)=D_{\textit{prey}}(y)(1-W\cdot L)$ (29) $\displaystyle\phantom{D_{\textit{chimp}}(y+1)=}+W\cdot n\cdot D_{\textit{chimp% }}(y)$

The GWO algorithm offersimproved accuracy with effectualdetection of attack. According to GWO [25], the update equation is given by,

$\displaystyle\vec{D}(y+1)=\vec{D}_{z}(y)-\vec{O}\cdot\vec{P}$ (30)

where, $y$ symbolize current iteration, $\vec{O}$ refers coefficient vector, $\vec{D}_{z}(y)$ signifies prey position, and $\vec{D}(y+1)$ is grey wolf position.

$\displaystyle\vec{D}(y+1)\!=\!\vec{D}_{z}(y)-\vec{O}\!\cdot\!|\vec{U}\!\cdot\!% \vec{D}_{z}(y)-\vec{D}(y)|$ (31)

where, $\vec{U}$ is coefficient vector and $\vec{D}(y)$ refers current position of wolf.

Assume $\vec{D}_{z}(y)>\vec{D}(y)$ ,

$\displaystyle\vec{D}(y+1)\!=\!\vec{D}_{z}(y)\!-\!\vec{O}\!\cdot\!\vec{U}\!% \cdot\!\vec{D}_{z}(y)\!+\!\vec{O}\!\cdot\!\vec{D}(y)$ (32) $\displaystyle\vec{D}(y+1)=\vec{D}_{z}(y)({1-\vec{O}\cdot\vec{U}})+\vec{O}\cdot% \vec{D}(y)$ (33) $\displaystyle\vec{D}_{z}(y)=\frac{\vec{D}(y+1)-\vec{O}\cdot\vec{D}(y)}{1-\vec{% O}\cdot\vec{U}}$ (34)

Assume $\vec{D}(y+1)=D_{\textit{chimp}}(y+1)$ and $\vec{D}_{z}(y)=D_{\textit{prey}}(y)$ , thus the above equation becomes:

$\displaystyle D_{\textit{prey}}(y)=\frac{D_{\textit{chimp}}(y+1)-\vec{O}\cdot D% _{\textit{chimp}}(y)}{1-\vec{O}\cdot\vec{U}}$ (35)

Now substitute Eq. (35) in Eq. (29),

$\displaystyle D_{\textit{chimp}}(y+1)=\frac{D_{\textit{chimp}}(y+1)-\vec{O}% \cdot D_{\textit{chimp}}(y)}{1-\vec{O}\cdot\vec{U}}$ $\displaystyle\qquad(1-W\cdot L)+W\cdot n\cdot D_{\textit{chimp}}(y)$ (36) $\displaystyle D_{\textit{chimp}}(y+1)=\frac{D_{\textit{chimp}}(y+1)}{1-\vec{O}% \cdot\vec{U}}({1-W\cdot L})$ $\displaystyle\qquad-\,\frac{\vec{O}\cdot D_{\textit{chimp}}(y)}{1-\vec{O}\cdot% \vec{U}}({1-W\cdot L})$ (37) $\displaystyle\qquad\quad\,+\,W\cdot n\cdot D_{\textit{chimp}}(y)$ $\displaystyle D_{\textit{chimp}}(y+1)-\frac{D_{\textit{chimp}}(y+1)}{1-\vec{O}% \cdot\vec{U}}({1-W\cdot L})$ $\displaystyle\qquad=\,-\frac{\vec{O}\cdot D_{\textit{chimp}}(y)}{1-\vec{O}% \cdot\vec{U}}({1-W\cdot L})$ (38) $\displaystyle\qquad\quad\,+\,W\cdot n\cdot D_{\textit{chimp}}(y)$ $\displaystyle D_{\textit{chimp}}(y+1)\left[{1-\frac{({1-W\cdot L})}{1-\vec{O}% \cdot\vec{U}}}\right]$ $\displaystyle\qquad=\,-\frac{\vec{O}\cdot D_{\textit{chimp}}(y)}{1-\vec{O}% \cdot\vec{U}}({1-W\cdot L})$ (39) $\displaystyle\qquad\quad\,+\,W\cdot n\cdot D_{\textit{chimp}}(y)$ $\displaystyle D_{\textit{chimp}}(y+1)\left[{\frac{1-\vec{O}\cdot\vec{U}-({1-W% \cdot L})}{1-\vec{O}\cdot\vec{U}}}\right]$ $\displaystyle\qquad=\,W\cdot n\cdot D_{\textit{chimp}}(y)$ (40) $\displaystyle\qquad\quad\,-\,\frac{\vec{O}\cdot D_{\textit{chimp}}(y)}{1-\vec{% O}\cdot\vec{U}}({1-W\cdot L})$ $\displaystyle D_{\textit{chimp}}(y+1)\left[{\frac{1-\vec{O}\cdot\vec{U}-1+W% \cdot L}{1-\vec{O}\cdot\vec{U}}}\right]$ $\displaystyle\qquad=\,W\cdot n\cdot D_{\textit{chimp}}(y)$ (41) $\displaystyle\qquad\quad\,-\,\frac{\vec{O}\cdot D_{\textit{chimp}}(y)}{1-\vec{% O}\cdot\vec{U}}({1-W\cdot L})$ $\displaystyle D_{\textit{chimp}}(y+1)\left[{\frac{W\cdot L-\vec{O}\cdot\vec{U}% }{1-\vec{O}\cdot\vec{U}}}\right]$ $\displaystyle\qquad=\,W\cdot n\cdot D_{\textit{chimp}}(y)$ (42) $\displaystyle\qquad\quad\,-\,\frac{\vec{O}\cdot D_{\textit{chimp}}(y)}{1-\vec{% O}\cdot\vec{U}}({1-W\cdot L})$

Thus, the final update equation of proposed ChGWO is given as,

$\displaystyle D_{\textit{chimp}}(y+1)\!=\!\frac{1-\vec{O}\cdot\vec{U}}{W\cdot L% -\vec{O}\cdot\vec{U}}\bigg{[}W\!\cdot\!n\!\cdot\!D_{\textit{chimp}}(y)$ $\displaystyle\qquad-\,\frac{\vec{O}\cdot D_{\textit{chimp}}(y)}{1-\vec{O}\cdot% \vec{U}}\left({1-W\cdot L}\right)\bigg{]}$ (43)

Step 4) Attacking behaviour

For modelling the chimp behaviour, it is supposed that first attacker, driver, barrier and chaser are well-versed concerning position of prey. Thus, four optimum solutions are expressed as,

$\displaystyle D(y+1)=\frac{D_{1}+D_{2}+D_{3}+D_{4}}{4}$ (44)

where, $D_{1}$ , $D_{2}$ , $D_{3}$ and $D_{4}$ signifies position of chimps.

Each best solution are described as,

$\displaystyle D_{1}=D_{\textit{attacker}}-W_{1}(X_{\textit{attacker}})$ (45)

where, $D_{\textit{attacker}}$ is attacker chimp, $W_{1}$ represent coefficient vector, and $X_{\textit{attacker}}$ is attacker prey.

$\displaystyle D_{2}=D_{\textit{barrier}}-W_{2}(X_{\textit{barrier}})$ (46)

where, $D_{\textit{barrier}}$ is barrier chimp, $W_{2}$ represent coefficient vector, and $X_{\textit{barrier}}$ is barrier prey.

$\displaystyle D_{3}=D_{\textit{chaser}}-W_{3}(X_{\textit{chaser}})$ (47)

where, $D_{\textit{chaser}}$ is chaser chimp, $W_{3}$ represent coefficient vector, and $X_{\textit{chaser}}$ is chaser prey.

$\displaystyle D_{4}=D_{\textit{driver}}-W_{4}(X_{\textit{driver}})$ (48)

where, $D_{\textit{driver}}$ is driver chimp, $W_{4}$ represent coefficient vector, and $X_{\textit{driver}}$ is driver prey.

The distance of each solution is given as,

$\displaystyle X_{\textit{attacker}}=\left|{L_{1}D_{\textit{attacker}}-x_{1}D}\right|$ (49) $\displaystyle X_{\textit{barrier}}=\left|{L_{2}D_{\textit{barrier}}-x_{2}D}\right|$ (50) $\displaystyle X_{\textit{chaser}}=\left|{L_{3}D_{\textit{chaser}}-x_{3}D}\right|$ (51) $\displaystyle X_{\textit{driver}}=\left|{L_{4}D_{\textit{driver}}-x_{4}D}\right|$ (52)

where, $L_{1}$ , $L_{2}$ , $L_{3}$ , $L_{4}$ represent coefficient vector and $x_{1}$ , $x_{2}$ , $x_{3}$ , $x_{4}$ are coefficient vector.

Step 5) Prey attacking

Here, the chimps attack prey and end hunt as soon as prey terminates its movement. For precisely modelling attacking behaviour, $K$ should be minimized.

Step 6) Searching for pray

Amongst chimps, the exploration is done withchimp’s position. It diverges for seeking the prey and accumulates to attack the prey.

Step 7) Social incentive

To attain social meet and pertinent social motivation in final phase causes the chimps for releasing hunting abilities. To model this behaviour, there is a probability to choose among normal update method or chaotic model to update chimp position, and is given by,

$\displaystyle D_{\textit{chimp}}(y+1)=$ $\displaystyle\quad\left\{\begin{array}[]{l}{\displaystyle\frac{1-\vec{O}\cdot% \vec{U}}{W\!\cdot\!L-\vec{O}\!\cdot\!\vec{U}}}\bigg{[}W\!\cdot\!n\!\cdot\!D_{% \textit{chimp}}(y)\\ \qquad-\,{\displaystyle\frac{\vec{O}\cdot D_{\textit{chimp}}(y)}{1-\vec{O}% \cdot\vec{U}}}\left({1-W\cdot L}\right)\bigg{]}\\ \qquad\text{if}\quad\mu<0.5\\ \textit{Chaotic value}\\ \qquad\text{if}\quad\mu>0.5\end{array}\right.$ (53)

where, $\mu$ represents random number between [0, 1].

Step 8) Re-evaluation update equation withfitness: The fitness of newposition is re-calculatedin whichoptimal key is attained.

Step 9) Terminate: The best solutions are developed in iterative manneruntil maximal iterations is acquired. The pseudo code of developed ChGWO is examined in Table 1.

Thus, the privacy preserved data obtained from the proposed ChGWO is given by $I$ .

3.2.2 Clustering of privacy preserved data with DFC in reducer

The clustering of privacy preserved data is done using DFC [27] in reducer denoted as $R_{e}$ . Assume the input data attained by the DFC is $I=\{I^{\prime}_{1},I_{2},\ldots,I^{\prime}_{i},\ldots,\linebreak I^{\prime}_{h}\}$ , count of clusters as $d$ , batch size as $m_{c}$ , and the highest iteration as $\text{Iter}_{\max}$ . The splitting of sample $I$ into $m/m_{c}$ batches, and express each batch of the training data as $I_{i}$ , $i=$ 1, 2, …, $m/m_{c}$ . In addition, consider weights as $\omega$ and bias as $D$ for training the autoencoder, and denote the fuzzy clustering center as $\lambda$ . The hidden features $Q_{v}$ , fuzzy memberships $I_{v}$ , and pseudo labels $k_{v}$ is evaluated from $I_{v}$ for all the batches to initialize affinities $L_{v}\in\Re^{m_{c}\times m_{c}}$ that are affinities $p_{jl}$ for $v^{\text{th}}$ batch of data. The loss function of auto encoder is given by,

$\displaystyle O({I\theta})\!=\!\frac{1}{m}\sum\limits_{i=1}^{m}\left\|{x_{E,S}% \left({I_{i}}\right)-I_{i}}\right\|^{2}\!+\!\eta.Y(E)$ (54)

where, $\|\,\|$ express Euclidean norm, $Y(E)$ signifies regularization terms, and $x_{E,S}(I_{i})$ refers reconstruction term. Assume $Q=\{g_{1},g_{2},\ldots,g_{i},\ldots,g_{m}\}$ , and $g_{i}\in\Re^{q}$ indicates hidden representation produced from the auto encoder. The fuzzy clustering layer adapts the input as $g_{i}$ and produces the fuzzy membership $M_{ij}$ , and is given by,

$\displaystyle M_{ij}=\biggl{(}\|{g_{i}-\lambda_{j}}\|^{2}$ $\displaystyle\qquad\quad\,-\,\frac{\gamma}{\sum\limits_{\tau=1}^{q}{\|{\lambda% _{\tau}-\bar{\lambda}}\|^{2}}}\|{\lambda_{j}-\bar{\lambda}}\|^{2}\biggr{)}^{-1% /(a-1)}$ (55) $\displaystyle\qquad\quad\,\bigg{/}\sum\limits_{\tau=1}^{q}\biggl{(}\|{g_{i}-% \lambda_{\tau}}\|^{2}$ $\displaystyle\qquad\quad\,-\,\frac{\gamma}{\sum\limits_{\tau=1}^{q}{\|{\lambda% _{\tau}-\bar{\lambda}}\|^{2}}}\|{\lambda_{\tau}-\bar{\lambda}}\|^{2}\biggr{)}^% {-1/(a-1)}$

where, $\gamma$ symbolizes hyper attribute utilized for balancing distance from cluster and within cluster distance in cluster space, and $a$ signifies fuzzifier.

Table 1

Pseudo code of developed ChGWO

Input: Population

D

Output: Attacker

D_{\textit{attacker}}

Begin

Initialize chimp population and other algorithmic parameters

Evaluate each chimp position

Split chimps arbitrarily into sovereign groups

Until termination criterion is acquired

Evaluate fitness using Eq. (18)

D_{\textit{attacker}}=

best search agent

D_{\textit{chaser}}=

second best search agent

D_{\textit{barrier}}=

third search agent

D_{\textit{driver}}=

fourth search agent

While

y<\max\,\textit{itn}

For each chimp

Mine chimp group

Utilize group strategy for updating parameters

End for

For each search chimp

(\mu<0.5)

(|a|<1)

Update position using Eq. (53)

Else if

(|a|>1)

Choose a arbitrary search agent

End if

Else if

(\mu>0.5)

Update position using Eq. (24)

End if

End for

Update algorithmic parameters

y=y+1

End while

Return

D_{\textit{attacker}}

Figure 3.

Architecture of Mapper and reduce phase for privacy-preserved data clustering.

The pseudo labels $k_{v}$ are mined by $\vartheta_{v}$ and evaluate target $I_{v}$ as,

$\displaystyle I_{ij}=\frac{M_{ij}^{2}\big{/}\sum\limits_{i}{M_{ij}}}{\sum% \limits_{\tau=1}^{q}{\biggl{(}{M_{i\tau}^{2}\big{/}\sum\limits_{i}{M_{i\tau}}}% \biggr{)}}},\,\,\sum\limits_{j=1}^{q}I_{ij}=1\,\forall i$ (56)

The loss function of KL-divergence is given by,

$\displaystyle\min\text{KL}({I\|\vartheta})=\min\sum\limits_{i=1}^{m}{\sum% \limits_{j=1}^{q}{I_{ij}\log\frac{I_{ij}}{M_{ij}}}}$ (57)

The graph regularization is expressed as,

$\displaystyle\min H_{g}=\min\sum\limits_{i,l=1}^{m}{\|{g_{i}-g_{l}}\|^{2}p_{il}}$ (58)

Here, $p_{il}$ symbolize affinity amongst $I_{i}$ and $I_{l}$ . When affinity amongst $I_{i}$ and $I_{l}$ is high the distance among $h_{i}$ and $g_{l}$ becomes small to reduce regularization. At last, the loss function $B_{v}$ is expressed as,

$\displaystyle B=\sum\limits_{i=1}^{m}{\|{x_{E,S}({h_{i}})-h_{i}}\|_{2}^{2}}+\,% \beta_{1}\sum\limits_{i=1}^{m}\sum\limits_{j=1}^{q}K_{ij}\log\frac{I_{ij}}{M_{% ij}}+\,\beta_{2}\sum\limits_{i,l=1}^{m}\|{g_{i}-g_{l}}\|^{2}p_{il}$ (59) $\displaystyle p_{il}=\left\{\begin{array}[]{ll}\exp({-\|{g_{i}-g_{l}}\|^{2}/% \kappa})/\chi&;\psi_{i}=\psi_{l}\\ 0&;\psi_{i}\neq\psi_{l}\\ \end{array}\right.$ (60)

where, $\beta_{1}$ and $\beta_{2}$ signifies hyper parameter, $\kappa$ is kernel which is fixed to 1, $\chi$ refers affinity hyper parameter, which is utilized for controlling the affinity scale, and $\ell_{i}$ denote pseudo label, and $\ell_{l}$ refers label.

Let the clusters obtained by DFC for clustering the privacy preserved data is expressed as,

$\displaystyle\ell=\{\ell_{1},\ell_{2},\ldots,\ell_{c},\ldots,\ell_{d}\}$ (61)

where, $\ell$ signifies clustered privacy preserved data, $d$ symbolize number of clusters and $\ell_{c}$ signifies $c^{\text{th}}$ cluster.

3.2.3 Mapper and reduce phase and its architecture

Figure 3 reveals the architecture of Mapper and reduce phase for clustering the privacy preserved data. For reducing the time taken for evaluation and to handle the dispersed data, the study utilizes MapReduce model that solves complexity problem and computational problems. The two main functions of MapReduce include the map and reduce function which takes input data as pertinent patterns and group intermediary data of mappers to generate clustered output. At first, the input data is provided to the MapReduce framework wherein the partitioning of data is done and fed to mappers wherein the privacy preservation is done using proposed ChGWO and the reducer performs clustering using DFC.

4. Results and discussion

The efficiency of developed ChGWO is devised with privacy, utility and random coefficient by altering training data.

4.1 Experimental set-up

The execution of devised ChGWO is carried out in Python with PC with 2 GB RAM, Windows 10 OS, and Intel i3 core processor.

4.2 Dataset description

The analysis is performed with MHEALTH Dataset [26]. This dataset comprises standard methods that deal with the behaviour of human considering multimodal body sensing. Here, the dataset is multivariate in nature. The number of instances is 120 with 23 attributes. The number of webhits attained is 126925. The attribute is real in nature with certain attributes that includes standing still, sitting and relaxing, and Lying down.

4.3 Evaluation measures

The evaluation measures considered for the assessment are as follows.

(a)
Privacy: It is already described in Section 3.2.1b).
(b)
Utility: It is already described in Section 3.2.1b).
(c)
Random coefficient: It refers the similarity amongst two clusters by adapting all pairs of samples. It is also defined as the percentage of precise decisions made by the technique and is given by,

$\displaystyle R_{c}=\frac{\mu^{p}+\mu^{n}}{\mu^{p}+\mu^{n}+\omega^{p}+\omega^{% n}}$ (62)

Where, $\mu^{p}$ signifies true positive, $\mu^{n}$ express true negative, $\omega^{p}$ denote false positive and $\omega^{n}$ is false negative.

4.4 Performance analysis

The assessment of proposed ChGWO is done by varying the size of population and iteration. Here, the assessment is done considering privacy, utility and random coefficient.

Figure 4.

Assessment of proposed ChGWO with population size using (a) utility, (b) privacy, (c) random coefficient.

4.4.1 Assessment by altering population size

Figure 4 displays assessment of ChGWO with population size. Here, the assessment with utility is displayed in Fig. 4a. For 50% training data, the utility calculated by proposed ChGWO with population size $=$ 5, 10, 15, 20 are 0.824, 0.830, 0.840, and 0.841. Also, for 90% training data, the utility calculated by proposed ChGWO with population size $=$ 5, 10, 15, 20 are 0.885, 0.894, 0.915, and 0.925. The assessment with privacy is displayed in Fig. 4b. For 50% training data, the privacy calculated by proposed ChGWO with population size $=$ 5, 10, 15, 20 are 0.804, 0.810, 0.819, and 0.825. Also, for 90% training data, the privacy calculated by proposed ChGWO with population size $=$ 5, 10, 15, 20 are 0.880, 0.897, 0.906, and 0.915. The assessment with random coefficient is revealed in Fig. 4c. For 50% training data, the random coefficient calculated by proposed ChGWO with population size $=$ 5, 10, 15, 20 are 0.495, 0.508, 0.510, and 0.514. Also, for 90% training data, the random coefficient calculated by proposed ChGWO with population size $=$ 5, 10, 15, 20 are 0.620, 0.630, 0.650, and 0.659.

4.4.2 Assessment by altering iteration

Figure 5.

Assessment of proposed ChGWO by altering iteration using (a) utility, (b) privacy, (c) random coefficient.

Figure 6.

Assessment of techniques with (a) utility, (b) privacy, (c) random coefficient.

The assessment of proposed ChGWO with iteration is revealed in Fig. 5. Here, the assessment considering the utility is revealed in Fig. 5a. When training data is 50%, the utility calculated by proposed ChGWO with iteration $=$ 20 is 0.801, iteration $=$ 40 is 0.814, iteration $=$ 60 is 0.825, iteration $=$ 80 is 0.837, and iteration $=$ 100 is 0.841. Also, when training data is 90%, the utility calculated by proposed ChGWO with iteration $=$ 20 is 0.865, iteration $=$ 40 is 0.875, iteration $=$ 60 is 0.884, iteration $=$ 80 is 0.905, iteration $=$ 100 is 0.925. The assessment with privacy is displayed in Fig. 5b. When training data is 50%, the privacy calculated by proposed ChGWO with iteration $=$ 20 is 0.775, iteration $=$ 40 is 0.785, iteration $=$ 60 is 0.799, iteration $=$ 80 is 0.804, and iteration $=$ 100 is 0.825.

Also, when training data is 90%, the privacy calculated by proposed ChGWO with iteration $=$ 20 is 0.865, iteration $=$ 40 is 0.875, iteration $=$ 60 is 0.885, iteration $=$ 80 is 0.895, and iteration $=$ 100 is 0.915.

The assessment with random coefficient is revealed in Fig. 5c. When training data is 50%, the random coefficient calculated by proposed ChGWO with iteration $=$ 20 is 0.469, iteration $=$ 40 is 0.475, iteration $=$ 60 is 0.485, iteration $=$ 80 is 0.496, and iteration $=$ 100 is 0.514. Also, when training data is 90%, the random coefficient calculated by proposed ChGWO with iteration $=$ 20 is 0.598, iteration $=$ 40 is 0.601, iteration $=$ 60 is 0.614, iteration $=$ 80 is 0.635, and iteration $=$ 100 is 0.659.

4.5 Comparative strategies

The strategies considered for assessment includes IRSA [1], MLBDC-PP-LWO [2], FrSparse FCM [7], and proposed ChGWO.

4.6 Comparative analysis

Figure 6 displays assessment by altering training data. Here, the assessment with utility is revealed in Fig. 6a. For 50% training data, the utility calculated by IRSA, MLBDC-PP-LWO, FrSparse FCM, and proposed ChGWO are 0.701, 0.737, 0.785, and 0.841. Also, for 90% training data, the utility calculated by IRSA, MLBDC-PP-LWO, FrSparse FCM, and proposed ChGWO are 0.754, 0.806, 0.848, and 0.925 and the performance improvement with respect to proposed ChGWO is 18.486%, 12.864%, 8.324%. The assessment with privacy is displayed in Fig. 6b. For 50% training data, the privacy calculated by IRSA, MLBDC-PP-LWO, FrSparse FCM, and proposed ChGWO are 0.685, 0.726, 0.765, and 0.825. Also, for 90% training data, the privacy calculated by IRSA, MLBDC-PP-LWO, FrSparse FCM, and proposed ChGWO are 0.747, 0.796, 0.833, and 0.915 and performance improved with respect to proposed ChGWO are 18.36%, 13.00%, 8.961%. The assessment with random coefficient is revealed in Fig. 6c. For 50% training data, the random coefficient calculated by IRSA, MLBDC-PP-LWO, FrSparse FCM, and proposed ChGWO are 0.435, 0.451, 0.463, and 0.514. Also, for 90% training data, the random coefficient calculated by IRSA, MLBDC-PP-LWO, FrSparse FCM, and proposed ChGWO are 0.514, 0.533, 0.597, and 0.659 and the performance improved with respect to proposed ChGWO are 22.003%, 19.119%, 9.408%.

Table 2
Comparative assessment

Metrics	IRSA	MLBDC- PP-LWO	FrSparse FCM	Proposed ChGWO
Utility	0.754	0.806	0.848	0.925
Privacy	0.747	0.796	0.833	0.915
Random coefficient	0.514	0.533	0.597	0.659

4.7 Comparative discussion

Table 2 discusses the comparative assessment by altering training data using utility, random coefficient, and privacy. The highest utility of 0.925 is measured by proposed ChGWO while the utility attained by classical IRSA, MLBDC-PP-LWO, FrSparse FCM are 0.754, 0.806, 0.848. The high utility reveals that the proposed ChGWO is effective in extracting data that has high importance. The highest privacy of 0.915 is measured by proposed ChGWO while the privacy attained by classical IRSA, MLBDC-PP-LWO, FrSparse FCM are 0.747, 0.796, 0.833. The high privacy exposes that the proposed ChGWO is effective in encrypting the data and thereby produces elevated privacy. The highest random coefficient of 0.659 is measured by proposed ChGWO while the random coefficient attained by classical IRSA, MLBDC-PP-LWO, FrSparse FCM are 0.514, 0.533, 0.597. The DFC helps to attain elevated random coefficient by effectively identifying the similarity amongst the clusters.

5. Conclusion

To mitigate computational complexities of big data, the clustering technique is adapted as an imperative part. Here, a novel technique is developed for privacy preserved data clustering using MapReduce model. The goal is to develop optimization technique for privacy preservation. Here, the input data is attained from several types of distributed sources. The data is then partitioned and subjected to MapReduce model, which contains mapper and reducer. The mappers are utilized to perform privacy preservation by encrypting the data with several functionalities, like encryption, Kronecker product and secret key. Here, the secret key generation is done with proposed ChGWO algorithm. The proposed ChGWO is developed by combining ChOA, and GWO. The fitness function is newly developed with privacy and utility factor. Here, the privacy represents Jaro Winkler similarity and utility depicts accuracy. At last, the clustering of data is carried out using the DFC. The proposed ChGWO performed effectual privacy preserved data clustering with MapReduce framework. The proposed ChGWO offered enhanced efficiency with highest utility of 92.5%, highest privacy of 91.5% and highest random coefficient 65.9%. In future, other database can be adapted to validate feasibility of devised model.

References

Banasode

Padamannavar

. A Bigdata Process for Practical Privacy-Preserving Utilizing k-Means Clustering. International Journal of Engineering and Advanced Technology (IJEAT). 2019 December; 9(2).

Bolla

Anandan

. An Efficient Probabilistic Multi Labeled Big Data Clustering Model for Privacy Preservation Using Linked Weight Optimization Model. Turkish Journal of Computer and Mathematics Education (TURCOMAT). 2021; 12(11): 5510-5517.

Catak

Aydin

Elezaj

Yildirim-Yayilgan

. Practical implementation of privacy preserving clustering methods using a partially homomorphic encryption algorithm. Electronics. 2020; 9(2): 229.

Lekshmy

Abdul Rahiman

. A sanitization approach for privacy preserving data mining on social distributed environment. Journal of Ambient Intelligence and Humanized Computing. 2020; 11(7): 2761-2777.

Khan

Iqbal

Faizullah

Fahad

Ali

Ahmed

. Clustering based privacy preserving of big data using fuzzification and anonymization operation. arXiv preprint arXiv:2001.01491, 2020.

Zou

Zhao

Shi

Wang

Peng

Ping

Wang

. Highly secure privacy-preserving outsourced k-means clustering under multiple keys in cloud computing. Security and Communication Networks. 2020.

Kulkarni

Jena

Ravi Sankar

. MapReduce framework based big data clustering using fractional integrated sparse fuzzy C means algorithm. IET Image Processing. 2020; 14(12): 2719-2727.

Kulkarni

Jena

Sanjay

. Fractional Fuzzy Clustering and Particle Whale Optimization-Based MapReduce Framework for Big Data Clustering. Journal of Intelligent Systems. 2020; 29(1): 1496-1513.

Alguliyev

Aliguliyev

Abdullayeva

. Privacy-preserving deep learning algorithm for big personal data analysis. Journal of Industrial Information Integration. 2019; 15: 1-14.

10.

Elsir

Elsier

Abdurrahman

Mubarakali

. Privacy preservation in big data with data scalability and efficiency using efficient and secure data balanced scheduling algorithm. Journal of Scientific and Industrial Research. 2019; 78: 755-759.

11.

Rao

PRM

Krishna

Kumar

. Novel algorithm for efficient privacy preservation in data analytics. 2021.

12.

Praveen

Babu

. Big Data Clustering: Applying Conventional Data Mining Techniques in Big Data Environment. In: Innovations in Computer Science and Engineering. Singapore: Springer; 2019. pp. 509-516.

13.

Kumar

Singh

. A novel clustering technique for efficient clustering of big data in Hadoop Ecosystem. Big Data Mining and Analytics, 2019; 2(4): 240-247.

14.

. Improved K-means clustering algorithm for big data mining under Hadoop parallel framework. Journal of Grid Computing, 2020; 18(2); 239-250.

15.

Mandala

Rao

MCS

. PSV-GWO: Particle Swarm Velocity Aided GWO for Privacy Preservation of Data. Journal of Cyber Security and Mobility. 2019; 439-466.

16.

Bolla

Anandan

. Privacy Preservation Of Data Using Efficient Group Cost Optimization Method With Big Data Clustering. International Journal of Advanced Research in Engineering and Technology (IJARET). 2020 November; 11(11): 748-760.

17.

Singh

Kaur

. Hadoop: addressing challenges of big data. In: IEEE International Advance Computing Conference (IACC). February 2014. pp. 686-689.

18.

Nandimath

Banerjee

Patil

Kakade

Vaidya

Chaturvedi

. Big data analysis using Apache Hadoop. In: IEEE 14th International Conference on Information Reuse & Integration (IRI). August 2013. pp. 700-703.

19.

Gosain

Chugh

. Privacy preservation in big data. International Journal of Computer Applications. 2014; 100(17).

20.

Wang

Zheng

Rehmani

Yao

Huo

. Privacy preservation in big data from the communication perspective – A survey. IEEE Communications Surveys & Tutorials. 2018; 21(1): 753-778.

21.

Cuzzocrea

. Privacy and security of big data: current challenges and future research perspectives. In: Proceedings of the First International Workshop on Privacy and Secuirty of Big Data. November 2014. pp. 45-47.

22.

Perwej

. An experiential study of the big data. Science and Education. 2017; 4(1): 14-25.

23.

Min

Guo

Liu

Zhang

Cui

Long

. A survey of clustering with deep learning: From the perspective of network architecture. IEEE Access. 2018; 6: 39501-39514.

24.

Khishe

Mosavi

. Chimp optimization algorithm. Expert Systems with Applications. 2020; 149: 113338.

25.

Mirjalili

Lewis

. Grey wolf optimizer. Advances in Engineering Software. 2014; 69: 46-61.

26.

MHEALTH Dataset. accessed on October 2021.

27.

Feng

Chen

CLP

Guo

. Deep Fuzzy Clustering – A Representation Learning Approach. IEEE Transactions on Fuzzy Systems. 2020.

28.

Mandala

. SekharaRao

MVPC

, HDAPSO: Enhanced Privacy Preservation for Health Care Data. Journal of Networking and Communication Systems. 2019; 2(2): 10-19.

Hadoop framework integrated hybrid optimization algorithm for privacy preserved clustering mechanism

Abstract

Keywords

1. Introduction

2.1 Literature survey

2.2 Challenges

3.2.1 Privacy preservation in mapper

4. Results and discussion

4.1 Experimental set-up

4.2 Dataset description

4.3 Evaluation measures

4.4.2 Assessment by altering iteration

4.6 Comparative analysis

Table 2 Comparative assessment

5. Conclusion

References

Table 2
Comparative assessment