Fuzzy based cell generalization to improve the data utility with minimal loss of information

Abstract

Nowadays people are sharing their information in digitally, and their information plays the significant role in public sectors and business world assist in understanding the social behavior, statistical analysis and drafting the business policies for future direction. At the same time, users’ personal information should be preserved from adversaries while data publishing. Privacy preservation is one of the main agenda of data publishing, K-anonymity and its families likel-diversity and t-closeness which are the most familiar models to preserve the sensitive information of individuals in data publishing. Besides, data reliability is also the main factor of reliable anonymized data. Various anonymization techniques which are performed the data protection against attack models; however, the privacy models satisfy the privacy preservation effectively. The data utilization is also a vital account to achieve the purpose of data publishing. Moreover; Publisher focuses on the utilization of anonymized dataset that might be helpful to analyze the meaningful data. Otherwise, the potency of anonymization leads to diminishing the data utility of anonymized dataset and weakening the purpose of data publishing. In this proposed method aims to improve the utilization of anonymized data in data publishing with minimal information loss based on fuzzy alpha-cuts. Triangular membership function has applied over in the proposed method to generalize the quasi-identifiers by constructed alpha cut sets and taxonomy tree. The evaluation results are compared with an existing model concerning information loss and time execution time metrics. The performance of the proposed method is encouraging that based on the experimental results.

Keywords

Fuzzy logic fuzzy inference system data privacy fuzzy generalization anonymity

1. Introduction

In the current decade, sharing and publishing data are general activities in our digitally well-connected society. Concurrently, customers have rights to keep their sensitive data in secure. The business analyst should concern about customers data privacy[35 –37]. So individual privacy should be protected when publish or share the data to either public or research communities. Removing sensitiveinformation from the microdata is a general practice of data publisher. Researchers apply the data mining techniques and aim to discover the knowledge from the massive volume of data. While trading with sensitive information and others, privacy preservation might be confirmed. So, the data needs to be preserved from various attacks. Privacy preservation is a crucial task in data publishing since an adversary can assume the sensitive information easily by using background knowledge of data owners. In such scenarios, the solutions can’t be deriving for privacy preservation efficiently and providing an optimal solution for all publishing scenarios is impossible. Many researchers have been proposed the anonymization techniques and models [2–4 , 33] for addressing the privacy leakage and also strengthening the privacy models and algorithms. At the same time, the utilization of anonymized dataset should be considered an important point. The utility of anonymized dataset justifies the meaning and consequence of reliable published data. Strengthening the anonymization leads to increase the information loss, and the preserved data becomes useless. To achieve privacy preservation, the publishers usually try to hide or remove the sensitive attributes of an individual to whom data concern implement a set of transformations to the microdata before publishing it. These transformations include. (1) data suppression (replacing some values with a special value, indicating that the replaced values are not disclosed) [18 , 21], (2) data generalization (replacing some values with apparent value in the taxonomy of an attribute. [1 , 30]), and (3) Additive noise (the general idea is to replace the original sensitive value s with s + r where r is random value drawn from some distribution) [23]). Researchers focus the research on two aspects namely strengthening privacy and meaningful data utility with minimal loss of information [14]. Most of the contributions concentrate on increasing the privacy of published data and not considered the data utility. Studying the preserving privacy has been the spotlight of much research. To the best of our knowledge, most of the work in determining the optimal transformation to be performed on a database before it gets disclosed is inefficient in the sense that rising the table dimension will significantly improve the performance. Moreover, data anonymization techniques do not provide enough theoretical evidence that the disclosed table is immune from security breaches. Anonymization techniques include (1) hiding the sensitive identities by constructing each record indistinguishable from at least k–1 other records [3] (k-anonymity), assuring that the distance between the distribution of sensitive attributes in a class of records and the distribution of them in the whole table is no more than t [22] (t-closeness), and (3) ensuring that there are at least l distinct values for a given sensitive attribute in each indistinguishable group of records [5, 25] (l-diversity). Indeed, these techniques do not completely prevent re-identification [22]. It is shown in that the k-anonymity technique suffers from the curse of dimensionality and the level of information loss in k-anonymity may not be acceptable from a data mining point of view, because the specifics of the inter-attribute behavior have a potent revealing effect in the high dimensional case. A realization of t-closeness is proposed in [6], called SABRE. It partitions a table into buckets of similar sensitive attribute values in a greedy fashion, and then it redistributes tuples from each bucket into dynamically configured equivalence classes (EC). SABRE adopts the information loss measures for each EC as a unit rather than treating released records individually. It lacks the theoretical foundations for privacy guarantees and efficiency. In [12], a heuristic called ARUBA is proposed to address the tradeoff between data utility and data privacy. Although the proposed algorithm determines a personalized optimum data transformation based on predefined risk and utility models, it provides neither scalability nor theoretical foundations for privacy guarantees. The notion of Differential privacy [7 , 27] has become very popular in the database communities. It requires that the distribution of outcomes of computation does not change significantly when one individual changes their input data. A randomized query satisfies differential privacy if the likelihood of obtaining a certain answer from a database x is not “too” different from the likelihood of obtaining the same answer from other databases which differ from x for only one individual. There are some different approaches such as fuzzy approaches [9 , 31] take into an account of privacy preservation. L.Troiano et al. [15] and I.Diaz et al. [26] have attempted fuzzy IF-THEN rules to hide the sensitive information from the raw dataset. Fuzzy mathematics based method transformed the prime tuples for personalized the privacy preservation for both numerical and categorical quasi-identifier attributes. Manolis Terrovitis et al. have attempted to increase data utility and eliminate privacy leakage through the quasi-identifier transformation. G.R.Zhang [32] has approached the fuzzy-based method for categorical attribute and providing taxonomy tree based mapping table. I.Diaz et al. [13] have proposed a framework of fuzzy information that focuses on the counting elements that belong to equivalence class through understanding the concept of cardinality. Fuzzy partitions method overcomes some of the privacy treats by using particular properties of fuzzy sets.

2. Contribution

In this paper, the proposed method addresses the problem of weakening the utility of anonymized data when taking care of its risk below a certain acceptable threshold. The main agenda of the paper is to maximize the utility of anonymized dataset [16] and minimize information loss. Fuzzy logic concept applied into proposed work for generalizing each tuple of quasi-identifiers with respect to the taxonomy tree. Threshold value has introduced to create alpha-cut sets for constructing a taxonomy tree. Triangular membership function finds the fuzzy value of each tuple that the values are generalized by alpha-cut sets. Existing methods have applied the generalization anonymization until achieving the conditions of privacy models like k-anonymity, l-diversity. Existing methods those are not possible to attain privacy preservation without losing the confidence of utility maximization. For overcoming the issue, the research work focuses on the increasing utilization of anonymized dataset without following the existing privacy models and minimizes the information loss. The data flow diagram of the proposed work as shown in Fig. 1.

Fig. 1

Workflow of the proposed method.

2.1. Preliminaries

2.1.1. Fuzzy sets

Fuzzy sets expressed as an extension and generalization of the concepts of standard crisp sets [38 –40]. The unit interval values between the binary numbers [0,1] which represent the degree of membership function μA(x) of fuzzy sets The degree of membership μA(x) assumes values in the range from 0 to 1, i.e., the membership is set to unit interval [0,1] or μA(x) [0,1]. A fuzzy set A in the universe of discourse U can be defined as a set of ordered pairs, and it isgiven by□

$A = {(x, μ A (x)) | x U}$ (2.1) Where μA (x) is the degree of membership of x in A, and it represents the degree that x belongs to A

2.1.2. Membership function

Membership function defines the fuzziness in a fuzzy set irrespective of the elements in the set, which are discrete or continuous. The membership functions are generally represented in graphical form. There exist certain limitations for the shapes used to represent a graphical form of membership. In Equation 1, μA((x) is called a membership function of A. The membership value ranges in the interval [0,1], i.e., the range of the membership function is a subset of the non-negative real numbers whose supremum is finite.

Figure 2 shows the graphical representation of the triangular membership function. It is specified by three parameters {a, b, c} as follows $f (x, b, c) = {\begin{matrix} 0 & x \leq a \\ \frac{x - a}{b - c} & a \leq x \leq b \\ \frac{c - x}{c - b} & b \leq x \leq c \end{matrix}}$ (2.2)

Fig. 2

Triangular-shaped membership function.

The parameters {a, b, c} (with a < b < c) determine the x coordinates of the three corners of the underlying triangular membership function.

2.1.3. Lamba-cuts for fuzzy sets

The set A_λ (0 < λ < 1) called lambda (λ) cut set is a crisp set of the fuzzy set. The set A_λ is called strong lambda-cut set that consists of all elements of a fuzzy set whose membership functions have values strictly greater than a specified value. Any particular fuzzy set A can be transformed into an infinite number of the alpha-cut set as shown in Fig. 3. Because there are an infinite number of values alpha can take in the interval [0,1] $A_{λ} = {x | μ A (x) (x) > λ}; λ ln [0, 1]$

Fig. 3

Triangular-shaped Membership Function.

2.2. Proposed method

Let Q be the set of quasi-identifiers Q={a₁, a₂, a₃.… a_n} where i = 1,2,3... n in a raw data table. Each quasi-identifier a_i having n number of tuples t_j where t_j = 1, 2, 3 … n and It indicates that the t_j ln a_i. Our proposed method deals with fuzzy set theory. Therefore, the quasi-identifier a_i where i = 1,2,3 … n transformed into fuzzy set $\overset{´}{\hat{a}}$ where i = 1,2,3..n which is associated with membership value of each tuple t_jwhere j = 1,2,3... n. The degree of membership $μ \overset{´}{\hat{a}}$ (t_j) assumes values in the range from 0 to 1, i.e., the membership is set to a unit in the interval [0,1] or μ (t_j) [0,1]

$\overset{´}{\hat{a}} = {(t_{j}, μ \overset{´}{\hat{a}} (t_{j})) | Q}$ (2.3)

The Triangular membership functions are represented in graphical form. In Equation 3 $μ \overset{´}{\hat{a}} (t_{j})$ is called a membership function of $\overset{´}{\hat{a}}$ and its membership value ranges in the interval [0,1], i.e., the range of the membership function is a subset of the non-negative real numbers is finite. This function has applied for generating the fuzzy set. In Fig. 3 shows the graphical representation of triangular membership function.

The triangular curve is a function of a vector, t_j, and depends on three scalar parameters R, S, andT, as given by

$f (x, b, c) = {\begin{matrix} 0 & t_{j} \leq R \\ \frac{t_{j} - R}{S - R} & R \leq t_{j} \leq S \\ \frac{T - t_{j}}{T - S} & S \leq t_{j} \leq T \end{matrix}}$ (2.4)

Let R = min(t_j), S = Median(t_j) and T = Max(t_j) be the parameters that refer to the boundary of the membership function of the fuzzy set ${\overset{´}{\hat{a}}}_{i}$ . Proposed method introduces the support threshold value E for decomposing the fuzzy set that creates the alpha-cut sets α_k where k = 1,2,3, … E + 1. In Fig. 4a shows alpha-cut sets α_k concerning support thresholdvalue = 3.

Fig. 4

(a) α-cut sets for fuzzy set $μ \overset{´}{\hat{a}} (t_{j})$ with threshold value = 3. (b) Taxonomy tree of alpha-cut sets.

2.2.1. Taxonomy tree

Taxonomy is a practice of classification of things which is a representation of the generalized attributes. In the generalization process, the alpha-cut sets involve in such a way that is constructing the taxonomy. In fig, 4b shows the taxonomy tree of alpha-cut sets α_k.

The values of fuzzy set

μ \overset{´}{\hat{a}} (t_{j})

fit into the range of any one of the alpha-cut set α_k where k = 1,2,3… E + 1. The taxonomy tree leads to do the cell generalization operation simply. The Table 1 shows the generalized alpha-cut set values of the crisp set.

Table 1
Generalized alpha-cut set values

Taxonomy leafs	Generalized alpha cut set
[0,0.25]	0 ≤ 0.25
[0.25,0.5]	0.25 ≤ 0.5
[0.5,0.75]	0.5 ≤ 0.75
[0.75.1]	0.75 ≤ 1

Fuzzy value of each tuple belongs to alpha-cut sets α_k in such a way that the crisp value of t_j is also belongs to any one of the branches of the taxonomy tree.□

Alpha-cut set values of taxonomy transform into crisp value based on the computation steps that are given below.

Let the threshold value E = m, then the number of alpha-cut sets is m + 1, Let R, S, T are variables of triangular membership function and crisp value of alpha-cut set C(α_k)

Step 1: Let g is value of difference between alpha-cut set values.

if min(α_k) then

g - abs(S − R)/m + 1.

Step 2: Let α_k where k = 1

Step 3: If k = 1 then

C (α_k) = R + g

Else

C (α_k) = C (α_k) + g

Step 4: Increment the k value by 1

Step 5: do step 3 to 4 unit k = m + 1

if max(α_k) then

g = T − S/m + 1

Step 7: Let α_k where k = 1

Step 8: If k = 1 then

Max(C (α_k)) = S + g

Else

Max(C (α_k)) = Max (C (α_k)) + g

Step 9: Increment k value by 1

Step 10: do step 8 to 9 unit k = m + 1

2.2.2. Exercise

Quasi identifier a_i = {20, 24, 28, 37, 45, 57, 64, 78,81} where i = 1,2,3 … n have transformed into fuzzy set μ (t_j) = {0, 0.16, 0.32, 0.68, 1, 0.7, 0.47,0.08,0} where j = 1,2,3 … 7 derived triangular member function. Let set the threshold value m = 2 which generates the alpha-cut sets α_k where k = 1, 2, 3 then the alpha-cut sets facilitate to construct the taxonomy tree as shown in Fig. 5a and 5b.

Fig. 5

(a) Alpha-cut sets for fuzzy set a_i with threshold value = 2. (b) Taxonomy tree of fuzzy set ${\overset{´}{\hat{a}}}_{i}$ .

The Triangular Membership Function returns the fuzzy value $μ \overset{´}{\hat{a}} (t_{j})$ of the crisp value of quasi-identifier a_i. The fuzzy value of each tuple t_j is belongs to alpha-cut sets α_k where k = 1, 2, 3. Computation steps are applied to alpha-cut set values to transform into a crisp value.□

Let α_k = {0.0.35, 0.7, 1}, median(S)=45, R = 20 and T = 81.

Min(α_k)

Let g is a value that difference between each alpha cut set. $\begin{matrix} g & = & S - R / m + 1 \\ g & = & 45 - 20 / 3 = 8.33 \end{matrix}$ (2.5)

$α_{1} = R + g = 20 + 8.33 = 28.33 .$ (2.6)

$\begin{matrix} α_{2} & = & α_{1} + g = 36.66 . \\ α_{3} & = & α_{2} + 8.33 = 44.99 \end{matrix}$ (2.7) $\begin{matrix} Max (α_{k}) \end{matrix}$

$\begin{matrix} g & = & T - S / m + 1 \\ g & = & 81 - 45 / 3 = 12 \\ α_{1} & = & S + g = 45 + 12 = 57 . \end{matrix}$ (2.8)

$\begin{matrix} α_{2} = α_{1} + g = 69 . \\ α_{3} = α_{2} + 12 = 81 \end{matrix}$ (2.9)

The value 0.16 is the fuzzy value and its crisp value 24. This value belongs to alpha-cut set α_k. [0.0, 0.35] then the crisp value 24 is transformed into generalized term like 20 ≤ 28.33 as shown in Table 2. This generalization process continues until n number of tuples of selected quasi identifier attribute Q1={a₁, a₂, a₃ …. a_n} where i = 1,2,3 … n.

Table 2

Maximum and minmum crisp values of alpha-cut set values

Alpha-cut set values	Crisp values - min(α_k)	Crisp values - Max(α_k)
0	20	81
0.35	28.33	69
0.70	36.66	57
1	44.99	45

3. Experiment

The motivation of the experimental analysis evaluates the proposed method through the comparison of other privacy models. The proposed method appraise in terms of Information loss and execution time.

The experiments have performed on 2.6 GHz Intel Core 2 Duo machine with 2 Gigabyte RAM and Windows 7 Ultimate operating system. The Netbeans java software tool has used to write Java code. Our proposed method implemented on an adult dataset which contains 32564 tuples form UD adult dataset which is a usual point of reference for privacy-preserving techniques.

In the experiment, Age, Education-Num, Hours per Week status and Country are considered as Quasi Identifier attributes and occupation is considered as a Sensitive attribute value as shown in Table 3. There are four datasets applied for time consumption analysis and data utility risk analysis. The outcome of information loss evaluation values shows in Fig. 6a and 6b. Figure 7a and 7b show the results based on the number of quasi-identifier engage in the execution time metric performance analysis. Initially, three quasi-identifiers participate in the first trial and continue with increasing one by one up to seven quasi-identifiers. The results interpret the metric value of time consumption of the proposed method in a significant manner. Generally, the metric value of execution time increases gradually when the number of the quasi-identifier is increased. At the same time, the trend of the execution time of all trials is decreasing while comparing with other methods. Generated taxonomy tree structure and minimum complexity of generalization method are the support to reduce the execution time. In another view, the data records increase into 5000 records in each trial and end up into 30000 records, The trends of whole trials deliver the reasonable outcome that compares with other methods, Raymond Chi-Wing (alpha) [2], Manolis Terrovitis (Local) [34],. Also, the execution time value reflects the complexity of computation among these existing methods. The metric of information loss appraise the performance of the proposed method and also compare with existing three methods that involved in execution time measurement. In this analysis, Datasets attempt all trial with same schema configuration which has practiced in an experimental study as above. The following formulae are used to measure information loss ILoss X. Xiao [30].

ILOSS (t_{g}) = \frac{| tg | - 1}{| DA |}

(2.10)

Table 3

Metadata about adult database

Sno	Attributes	Distinct Values	Type
1	Age	74	QI
2	Education-num	16	Quasi identifier
3	Hours per week	94	Quasi identifier
4	Capital-gain	116	Quasi identifier
5	Capital-loss	92	Quasi identifier
6	Country	41	Quasi identifier
7	Occupation	14	Sensitive attribute

Fig. 6

(a). Information loss, No of quasi identifiers d = 3, No of records 5000 to 30000. (b). Information loss. Set quasi identifiers d = 3, 4, 5, 6 and 7.

Fig. 7

(a) Execution in minutes, No of quasi identifiers d = 3, No of records 5000 to 30000. (b) Execution in minutess. Set quasi identifiers d = 3, 4, 5, 6 and 7.

where; |t_g| is the number of domain values that are descendants of t_g

|D_A| is the number of domain values in the attribute A of v_g.

ILOSS(t_g) = 0 if t_g is an original data value in the table.

In words, ILOSS(t_g) measures the fraction of domain values generalized by t_g.

The loss of a generalized record r is given by

$I Loss (r) = \sum_{r \in T} (w_{i} \times I Loss (t_{g}))$ (3.1)

Where w_i is a positive constant specifying the penalty weight of attribute A_i. The overall loss of a generalized table T is given by $I Loss (T) = \sum_{t_{g} \in r} I Loss (r)$ (3.2)

3.1. Experimental results

The experimental method evaluates the performance of information loss from two different trials. In the first trial, the number of records increases in each iteration from 5000 up to 30000 with interval value 5000. In Fig. 6a shows the information loss escalates gradually up to 15,000 records in x axis then the slightly go down because of the frequency value of quasi-identifier attribute values. From 20000 records the trend of information loss value steady increase up to 30000 records. According to first trial setup, the outcome of information loss metric value is encouraged when compared with existing methods. In the Second trial, the number of quasi-identifier increase in each run starts from three quasi-identifiers to seven. In Fig. 6b, the metric value starts from around 0.02 and steady increase. In the first trial, experimental configuration deals with the constant value of the number of quasi identified so that the information loss value to the number of records which increase in each run. In the second trial, the frequency value of attribute values of each quasi-identifiers which increase in each run to the number of quasi-identifier attributes. We observe that the information loss value is high when the number of quasi-identifier increasing so that the quasi-identifier attributes dominate the number of records while performing the evaluation of information loss. The outcome of the second trial experimental results shows that the metric value steadily increases with little more deviation compare to other methods and the encouraging results are shown in Fig. 6b. In Fig. 7a and 7b shows the execution time of proposed method which has evaluated in two different set up like information loss evaluation. In Fig. 7a, the spread of execution time value of all methods looks small. The alpha method looks slightly deviated after 25000 records trial configuration. The execution time metric value of the proposed method is slightly decreased when compared to existing methods. In the second approach, numbers of quasi-identifier attributes have increased gradually one by one up to six trials. The ranges of execution time values of all methods are very close to each other up to the second trial (Number of quasi-identifier 4). In the last trial, the execution time of the proposed method has decreased reasonable. The overall performance of the proposed improve significantly regarding information loss metric values and execution time.□

4. Discussion and conclusion

The threshold value E plays the major role in the taxonomy tree construction and drives the generalization process in effective and increase the data utilization. If the quasi-identifier attribute values have more frequency and number of interval set also less, the strength of privacy become weak. So the threshold value is the flexible parameter that claims the data utilization comfort. The publisher audit the entire dataset based on the frequency value of quasi-identifier attribute values and fix the threshold value.

The proposed method has attempted to minimize information loss and increase data utilization based on fuzzy alpha-cut sets. The threshold value and alpha cut sets have played a vital role in the entire process of the proposed method. The experimental results show the reduction of execution time and information loss minimized significantly. However, some problems are not yet resolved. In facts privacy gain and information loss are mutually exclusive, in this point of view, this work assists to preserve the privacy as well as improve the data utility. The future work is in progress to include the strengthening of privacy against various attack models and in extended data publishing scenario named multi-view publishing.

References

B.C.M.

Fung ,

Wang and

P.S.

Yu , Top-down specialization for information and privacy preservation, In Proc. of the 21st IEEE International Conference on data engineering (ICDE), 2005, pp. 205–216.

Raymond

Wong ,

Jiuyong

Li ,

Ada

Fu and

wang , (α, k)-anonymous data publishing, Journal Intelligent Information System, 2009, pp. 209–234.

Sweeney , k-Aanonymity: a model for protecting privacy, International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, 2002, pp. 557–570.

R.J.

Bayardo and

Agrawal , Data privacy through optimal k-anonymization, in Proc. IEEE ICDE,Washington, DC, USA, 2005, pp. 217–228.

Cao ,

Carminati ,

Ferrari and

K.-L.

Tan , CASTLE: Continuously anonymizing data streams, IEEE Trans. Depend. Secure Comput. 8(3) (2011) 337–352.

Cao ,

Karras ,

Kalnis and

K.-L.

Tan , SABRE: Asensitive attribute bucketization and redistribution framework for t-closeness, J. VLDB20(1) (2011), 59–81.

Dwork , Differential privacy, in Proc. ICALP, Venice, Italy, 2006, pp. 1–12.

Benjamin

C.M.

Fung ,

K.E.

Wang ,

Ada

Wai-Chee Fu and

Philip S.

Yu , Introduction to Privacy-Preserving Data Publishing Concepts and techniques, ISBN:978-1-4200-9148-9,2010.

Meng-bo

Xie ,

Quan

Qian , Fuzzy Set based Data Publishing For Privacy Preservation, IEEE, SNPD2016,2016. P. Quiros, P. Alonso, et al., Protecting data: a fuzzy approach, International Journal of Computer Mathematics 92(9) (2015), 1989–2000.

10.

V.K.

Saxena

Shashank

Pushkar , Fuzzy-Based Privacy Preserving Approach in Centralized Database Environment, Advances in Computational Intelligence | International Conference on Computational Intelligence, pp. 299–307.

11.

M.R.

Fouad ,

Elbassioni and

Bertino , Towards a differentially private data anonymization, Purdue Univ., West Lafayette, IN, USA, Tech. Rep. CERIAS 2012-1, 2012.

12.

M.R.

Fouad ,

Lebanon and

Bertino , ARUBA: A risk-utility-based algorithm for data disclosure, inProc. VLDB Workshop SDM, Auckland, New Zealand, 2008, pp. 32–49.

13.

Diaz ,

Ranilla ,

L.J.

Rodriguez-Muniz and

Troina , Fuzzy sets in data protection: Strategies and cardinalities, Log. J.IGPL20(4) (2011), 657–666.

14.

Ghinita ,

Karras ,

Kalnis and

Mamoulis , Fast data anonymization with low information loss, in Proc. Int. Conf. VLDB, Vienna, Austria, 2007, pp. 758–769.

15.

Troiano ,

L.J.

Rodriguez-Muniz ,

Ranilla and

Diaz , Interpretabililty of fuzzy association rules as means of discovering threats to privacy, Int. J. Comput. Maths. 89(3) (2012), 325–333.

16.

Xu ,

Wang ,

Pei ,

Wang ,

Shi and

A.W.C.

Fu , Utility based anonymization using local recoding, In proc of the 12th ACM International Conference on knowledge Discvery and Data Mining (SIGKDD), Philadelphia, PA, August, 2006.

17.

Gupta ,

Ligett ,

McSherry ,

Roth and

Talwar , Differentially private combinatorial optimization, in Proc. Annu. ACM-SIAM SODA, Philadelphia, PA, USA, 2010, pp. 1106–1125.

18.

V.S.

Iyengar , Transforming data to satisfy privacy constraints, in Proc. ACM SIGKDD, Edmonton, AB, Canada, 2002, pp. 279–288.

19.

Li ,

W.H.

Qardaji and

Su , Provably private data anonymization: Or, k-anonymity meets differential privacy, CoRR, abs/1101.2604, 2011.

20.

R.J.

Barbaro ,

Agarwal , Data privacy through optimal k-anonymization, In Proc, of the 21st IEEE International Conference on Data Engineerring (ICDE), (2006), pp. 217–228.

21.

Lefevre ,

D.J.

DeWitt and

Ramakrishnan , Incognito: Efficient full-domain k-anonymity. In Proc, of ACM International Conference on Management of Data(SIGMOD), (2005), pp.49–60.

22.

Li and

Li , t-Closeness: Privacy beyond k-anonymity and l-diversity, in Proc. IEEE ICDE, Istanbul, Turkey, 2007, pp. 106–115.

23.

Liu ,

Kantarcioglu and

Thuraisingham , The applicability of the perturbation based privacy preserving data mining for real-world data, Data Knowl. Eng. 65(1) (2008), 5–21.

24.

Qian ,

Chuandong and

Hong , Fuzzy-based methods for privacy preserving, Application Research of Computers30(2) (2013), 518–520.

25.

Machanavajjhala ,

Gehrke ,

Kifer and

Venkitasubramaniam , l-Diversity: Privacy beyond k-anonymity, in Proc. 22nd ICDE, Atlanta, GA, USA, 2006, p. 24.

26.

Diaz ,

Ranilla ,

L.J.

Rodriguez-Muniz and

Troina , identifying the risk of attribute disclosure by mining fuzzy rules, in Information Processing and Management of Uncertainty in knowledge-Based systems. Theory and Methods.

Hullermieier ,

Kruse and

Hoffmann , eds., Springer, Berlin, 2010, 00.455-464.

27.

Mohammed ,

Chen ,

B.C.

Fung and

P.S.

Yu , Differentially private data release for data mining, In Proc. 17th ACM SIGKDD, New York, NY, USA, 2011, pp. 493–501.

28.

Samarati and

Sweeney , Generalizing data to provide anonymity when disclosing information, in Proc. ACM SIGACT-SIGMOD-SIGART Symp. PODS, New York, NY, USA, 1998, p. 188.

29.

Sweeney , Privacy-enhanced linking, SIGKDD Explor. 7(2) (2005), 72–75.

30.

Xiao and

Tao , Personalized privacy preservation, in Proc. ACM SIGMOD, Chicago, IL, USA, 2006, pp. 229–240.

31.

Poovammal and

Ponnavaiko , Preserving micro data release: categorical and numerical data, IEEE Selit, 2009.

32.

G.R.

Zhang , Privacy preserving using fuzzy sets, Computer Engineering Applications, 2010.

33.

Tamir

Tassa ,

Arnon

Mazza and

Aristides

Gionis , k-Concealment: An Alternative Model of k-Type Anonymity, Transactions on Data Privacy5 (2012), 189–222.

34.

Manolis

Terrovitis ,

Nikos

Mamoulis and

Panos

Kalnis , Local and global recoding methods for anonymizing set-valued data, The VLDB Journal—The International Journal on Very Large Data Bases20(1) (2011), 83–106.

35.

Bélanger and

R.E.

Crossler , Privacy in the digital age: A review of information privacy research in information systems, MIS Quarterly35(4) (2011), 1017–1041.

36.

Berendt ,

Günther and

Spiekermann , Privacy in E-commerce: Stated preferences vs. actual behavior, Communications of the ACM48(4) (2005), 101–106.

37.

Henner

Gimpel , Dominikus Kleindienst and Niclas Nuske, The upside of data privacy – delightling customers by implementing data privacy measures, Electronic Markets, 2018, pp 1–16.

38.

Noura Metawaa ,

Kabir

Hassana and

Mohamed

Elhoseny , Genetic algorithm based model for optimizing bank lending decisions. Expert Systems with Applications80 (2017), 75–82.

39.

S.K.

Lakshmanaprabu ,

Shankar ,

Deepak

Gupta ,

Ashish

Khanna , Joel

J.P.C.

Rodrigues, Plácido

Pinheiro and de Albuquerque

Victor Hugo C.

, "Ranking Analysis for Online Customer Reviews of Products Using Opinion Mining with Clustering". Complexity, 2018, pp.1–9.

40.

S.K.

Lakshmanaprabu ,

Shankar ,

Ashish

Khanna ,

Deepak

Gupta , Joel

J.P.C.

Rodrigues, Plácido

Pinheiro and de Albuquerque

Victor Hugo C.

, Effective features to classify big data using social internet of things, IEEE Access6 (2018), 24196–24204.

Fuzzy based cell generalization to improve the data utility with minimal loss of information

Abstract

Keywords

1. Introduction

2. Contribution

2.1.1. Fuzzy sets

Table 1 Generalized alpha-cut set values

4. Discussion and conclusion

References

Table 1
Generalized alpha-cut set values