Abstract
As data collection increases, more and more sensitive data is being used to publish query results. This creates a significant risk of privacy disclosure. As a mathematically provable privacy theory, differential privacy (DP) provides a tool to resist background knowledge attacks. Fuzzy differential privacy (FDP) generalizes differential privacy by employing smaller sensitivity and supporting multiple similarity measures. Thus the output error can be reduced under FDP. Existing FDP mechanisms employ sliding window strategy, which perturb the true query value to an interval with this value as the midpoint to maintain similarity of outputs from neighboring datasets. It is still possible for an attacker to infer some sensitive information based on the difference between the left and right endpoints of the output range. To address this issue, this article present two solutions: fixed interval perturbation and infinite interval perturbation. These strategies perturb the true query values of two neighboring datasets to the same interval and provide fuzzy differential privacy protection for the dataset. We apply the proposed method to the privacy-preserving problem of bipartite graph subgraph counting and verify the effectiveness by experiments.
Introduction
Due to the development of the Internet, the dramatic growth in the number of various terminals, including smartphones, has made it necessary for various companies and organizations, as well as governments, to collect and analyze huge amounts of data. In this process, privacy protection regarding personal information has become a big issue. How to protect private data and prevent the leakage of sensitive data information has become the focus of research in the field of data security.

From left to right, perturbation strategies of outputs are bounded interval (sliding window), fixed interval and infinite interval.
Existing privacy protection technologies mainly include k-anonymous [35], l-diversity [24], t-closeness [22], ε-differential privacy (DP) [9] and (S, s)-fuzzy differential privacy (FDP) [18]. K-anonymity is usually achieved by masking or generalizing some fields in a data table. It cannot prevent attribute disclosure when there is little diversity. L-diversity is a privacy criterion which can defend against homogeneity attacks and some degree of background knowledge attacks. However, it requires that there exists at least l well-represented values for each sensitive attribute which restricts its use. T-closeness improves this shortcoming by defining that the distance between the two distributions should be no more than a threshold t. Differential privacy is a privacy protection mechanism, which defines privacy by adding random noise to the query results when sharing data and prevents attacks based on background knowledge. As differential privacy has strong mathematical guarantees, it attracts increasing attention [3, 37]. The application fields of DP include recommender systems, location-based systems and social networks [21, 42]. Many scholars have summarized its research progress [10–13, 44].
However, in differential privacy, the sensitivities of many queries are set very large for defending against differential attacks 1 In this case, the output errors will increase with increasing query sensitivities and the noise can even completely mask the true query value. As a generalization of DP, fuzzy differential privacy [18] is proposed to acquire a more flexible trade-off between the accuracy of outputs and the privacy-preserving level of data by employing smaller sensitivities and multiple similarity measures.
The authors in paper [18] supply triangular FDP mechanisms and curved triangular FDP mechanisms which perturb outputs on bounded sliding intervals (i.e., intervals with different midpoints D, D′ and the same radius), see Fig. 1(a). Our concerns focus on getting a more privacy-protective and comprehensive model in the FDP framework. However, bounded sliding FDP model has a weakness that can be attacked. Precisely, for neighboring data sets D and D′, if the output obtained by the attacker is located in the part of the red dot, it can be inferred that the result originates from D. If the output obtained by the attacker is located in the part of the blue dot, it can be inferred that the result originates from D′. It means that existing fuzzy differential privacy mechanisms have weaknesses in the protection of values near the endpoints of the interval. This motivates us to design more secure FDP mechanisms to output results on the same interval.

Diagram of FDP mechanism on fixed and infinite interval.
There are two feasible solutions, one for the same finite interval, named fixed interval, and one for an infinite interval. They are shown in Fig. 1(b) and Fig. 1(c), respectively. In this article, we propose novel FDP mechanisms supporting perturbations on fixed intervals and infinite intervals, which are illustrated in Fig. 1. We prove that the mechanisms and the corresponding fuzzy similarity measure are compatible, thus satisfying the conditions of FDP. The proposed FDP mechanisms are used to solve private subgraph counting problem in bipartite graphs and the effectiveness of the method is verified on real-world datasets.
Related work
Privacy-preserving data releasing has received increasing interest in various social network application fields. The authors in papers [5, 17] present automorphically equivalent and k-automorphism framework for privacy preserving network publication defending against multiple attacks (such as 1-neighbor-graph attack, degree attack). To address the privacy protection issues in bipartite graphs, the authors in papers [7, 8] present bipartite anonymity method based on classic anonymity techniques. The authors in paper [43] protect privacy by generalizing the original graph to super-nodes and super-edges. The authors in paper [36] use weighted bipartite graphs to model high-dimensional sparse data and achieve privacy preservation by clustering node attributes. The authors in paper [6] present a privacy-preserving bipartite graph matching framework for multimedia analysis and retrieval. The authors in paper [14] give a privacy-preserving solution for the bipartite ranking problem by employing homomorphic encryption and secure multi-party computation. The authors in paper [29] investigate privacy-preserving wireless communication methods using bipartite matching. The authors in paper [31] present the privacy preserving method of 2-party queries on bipartite graphs using private set intersection. Fundamental privacy limits are studied for bipartite networks under active attacks in [33]. The authors in paper [45] present differential privacy mechanisms for multi-agent systems with bipartite consensus over signed digraph. Differential privacy preservation for the subgraph counting (e.g. biclique counting) problem in bipartite graphs has not been investigated to date.
Preliminaries
Fuzzy set theory is a generalization of classical set theory. In classical set theory, elements either belong to a set or they do not. In fuzzy set theory, the degree to which an element belongs to a set takes values between 0 and 1. Since its establishment in 1965, fuzzy set theory [40] has a wide range of applications in management science, artificial intelligence, cybernetics, decision theory, and data analysis. In this section, we introduce several basic concepts in fuzzy sets, fuzzy differential privacy in order to facilitate a clear exposition of the proposed approach. In addition, some basic definitions of the subgraph counting problem in graph data query are presented for studying the privacy protection of bipartite graphs.
Fuzzy sets
All fuzzy sets on X is denoted by F (X). The equality and containment relationships of two fuzzy sets
1.
2.
3.
4.
5. For two nonempty fuzzy sets
Differential privacy
Dwork et al. [9] demonstrate that it is impossible to absolutely prevent the leakage of privacy, mainly due to the usefulness requirement. As long as data needs to be used, it will inevitably lead to information leakage.
Therefore, the remaining possibility is how to limit relative privacy leakage as much as possible, and this is where "difference" comes in: any query result should not be able to be used to infer, to a certain extent, whether an individual’s data is included in the dataset D. This is reflected in the definition of "neighboring datasets". A pair of neighboring datasets is one in which two datasets (D, D′) differ by only one record, denoted by d (D, D′)) =1. A differential attack is a type of attack that infers information about the original dataset by analyzing the query values corresponding to neighboring datasets. Differential privacy is a privacy-preserving approach to resist differential attacks defined as the degree to which neighboring datasets are indistinguishable. This methodology works by applying certain perturbations to the query results so that the output results corresponding to adjacent data sets are similar.
Fuzzy differential privacy
Fuzzy differential privacy (FDP) [18] maintains the similarity of the corresponding outputs of neighboring datasets by outputting the query results to fuzzy sets. FDP uses the degree of similarity of fuzzy sets to indicate the degree to which the query results are perturbed and also the level that the model protects privacy.
and s0 is called similarity level. Note: A pair of subset neighboring data sets refers to a pair of neighboring data sets that at least one of the two

Examples of (2, 2)-biclique and (2, 3)-biclique.
This section introduces an important type of query for graph data (also called network data)-subgraph counting. Subgraph counting is a basic statistical method in data querying, data mining and social networking, which refers to counting subgraphs with a given type in the input graph. Suppose the given graph dataset is G = (V, E) and g = (V0, E0) is a subgraph of G. The subgraph counting query f g (G) returns the number of all subgraphs in graph G that are isomorphic to g 2 In the case where the subgraph structure g is determined without ambiguity, f g (G) is also often abbreviated to f (G).
Bipartite graphs are a special type of graph whose vertices can be partitioned into two disjoint and independent sets such that each edge connects two vertices from different sets. They are commonly used to model the relationship between two different types of objects and are widely used in various fields, such as social networks and economics [2, 15].
Let G = (U, V, E) denote a bipartite graph which can be divided into two disjoint vertex sets U, V and that edges in E connect one vertex in U with one vertex in V, i.e., E = {(u, v) |u ∈ U, v ∈ V, U ∩ V = ∅}, where ∩ denotes the intersection of two sets and ∅ denotes the empty set. A biclique, which is also called a complete bipartite subgraph, is a special kind of bipartite graph where every vertex in U is connected to every vertex in V. We use (a, b)-biclique to denote bicliques with |U| = a, |V| = b, where | · | denotes the cardinality of a set. In Fig. 1(left part), we show a (2, 2)-biclique, also called a butterfly (rectangle), in which U = {u1, u2} , V = {v1, v2} and E = {(u1, v1) , (u1, v2) , (u2, v1) , (u2, v2)}. We also show a (2, 3)-biclique in the right part of Fig. 1.

Examples of (2, 2)-biclique and (2, 3)-biclique counting.
Due to existing fuzzy differential privacy mechanisms have weaknesses in the protection of values near the endpoints of the interval. We design FDP mechanisms to output results on fixed intervals and infinite intervals.
FDP on fixed intervals
Existing FDP mechanisms are designed for bounded sliding intervals which are compatible with fuzzy similarity measures. In many practical applications, query results have a relatively small fixed range such as querying the average age of specific populations in demographic data. In this case, outputs on sliding window will bring security weaknesses near interval endpoints, as shown in the left part of Fig. 1. However, existing FDP mechanisms with fuzzy similarity measures can not be directly used on fixed interval, due to compatibility. To solve this problem, we design a new fuzzy similarity measure S★ and construct fuzzy mapping
Suppose that the fixed output range is [a, b]. The fuzzy mapping
The following fuzzy similarity measure S★ is designed for the collection of fuzzy sets which produced by the fuzzy mapping
∀γ1, γ2 > 0, and
(1)
(2)
(3) If
(4) If
(5)
According to Definition 2.2, S★ is a fuzzy similarity measure on set
Note: If we use the same γ in outputting fuzzy sets, the similarity degree increases with distance |f (D1) - f (D2) | decreasing (see Fig. 3) and increases with γ increasing (see Fig. 4). These facts fit human intuition.

An example of |f (D1) - f (D2) | < |f (D1) - f (D3) | and

An example of γ1 < γ2 and
For ∀D0 ⊂ D,
For any pair of neighboring data set
Note that SGS
f
(D) is the semi-global sensitivity. By Definition 2.5, we have
Note: We can also construct other mechanisms to be compatible with S★ according to
FDP protects privacy by designing membership functions of fuzzy outputs according to fuzzy similarity measures and a user specified similarity level s0. In this section, we present a variety of mechanisms which support various types of perturbation intervals and strategies under S, as a typical example (See Definition 2.4). In line with DP framework, we fist design mechanisms on infinite intervals.
That means
In a similar way, we can prove the following theorems.

Membership functions of various fuzzy mapping corresponding to f (D) and f (D′) on infinite intervals. η = 3 in
Several preliminary theorems
Subgraph counting query is a fundamental statistic in data mining and social networks, which refers to counting the number of subgraphs with given shape in an input graph. In this section, we investigate subgraph counting under FDP and prove that the semi-global sensitivity of the query f on a graph dataset G equals to its local sensitivity, SGS f (G) = LS (G). It means that the FDP subgraph counting problem can be calculated easily with local sensitivity.
Obviously, f is monotonically increasing. ∀G0, G : G0 ⊂ G, we need to prove |f (G0) - f ((G0) ′) | ≤ |f (G) - f (G′) |, in which (G0) ′ and G′ are conjugate neighboring data sets of G0, G. The proof process is divided into two steps.
∀e
i
∈ E, G
i
= G - {e
i
},
∀e
λ ∈ E
c
(complement of edge set E), G
λ = G + {e
λ},
(1) For G i = G - {e i }, we have Δf i (G0) ≤ Δf i (G), (∀e i ∈ E - {e0});
By reduction to absurdity, we assume that the conclusion (1) is not valid. ∃e
i
0
∈ E (e
i
0
≠ e0) : Δf
i
0
(G0) > Δf
i
0
(G). That is,
(a).
(b).
From (a), we have g ∈ F (G0) , g ∉ F (G). That is contrary with F (G0) ⊂ F (G).
From (b),
g ∈ F (G i 0 ) ⇒ e i 0 ∉ g.
This is contradictory.
Thus, (1) holds.
(2) Δf λ (G0) ≤ Δf λ (G), (∀e λ ∈ E c ).
When f (G0) = f (G), we have
When f (G0) ≠ f (G), we assume that the conclusion (2) is not valid. ∃e
λ0 ∈ E
c
: Δf
λ0 (G0) > Δf
λ0 (G). That is
(a).
(b).
From (a), we have
From (b),
That is contrary with e λ0 ∈ E c .
Thus, (2) holds.
(3) For G i = G - {e i }, we have Δf i (G0) = Δf i (G), (e i = e0).
By the definition of conjugate neighboring data sets G
i
and
Thus, (3) holds.
By (1), (2) and (3), we have
Datasets properties with counting bicliques
Datasets properties with counting bicliques
Thus, the subgraph counting query f is differentially increasing.□
Note: In a quite similar way, we can prove that the conclusion in the above theorem still holds under node privacy. However, whether all monotone incremental queries are difference incremental queries is still an open question.
Recall that, we need to calculate semi-global sensitivities for queries which are not differentially increasing. Theorem 4.1 and Corollary 4.2 indicate that we can directly employ FDP mechanismswith local sensitivities of subgraph counting.
We present the schematic diagrams (Fig. 1) illustrating the operation process of FDP mechanisms in the Introduction section.
Experimental evaluation
In this section, we first introduce our experiment settings, including the datasets and the baselines. Then, we report the results of experiments which are conducted on the platform of Matlab R2012a. We use a machine which is equipped with an Intel(R) Core(TM) i5-4590 3.30 GHz processor and a 8GB RAM.
Experimental settings
Experimental parameters
Experimental parameters

(2, 2)-biclique counting.

(2, 3)-biclique counting.
To evaluate the performance of our algorithms, we compare them with two competitors, Laplace mechanism and ladder frameworks in DP. Laplace mechanism adds Laplace noise (GS/ε, DP) to the exact value of subgraph counts. Ladder frameworks are reformed from paper [41]. Besides, we employ a relative error curve adding Laplace noise (LS/ε, not DP) as a reference lower bound. We measure the accuracy of the algorithms by the median relative error, median |output1 - f (G) |/f (G) , |output2 - f (G) |/f (G) , ⋯ , |output N - f (G) |/f (G) ), in which f (G) represents the true value of biclique counting and N represents the execute times of our random algorithm. We set N = [50, 100, 200, 300, 400, 500]. The conversion formula of the similarity level s0 is ε = - ln s0. The specific parameters are included in Table 3.
Experimental results of (2, 2)-biclique counting are shown in Fig. 7. As can be seen, FDP achieves good accuracy on f⋈ over all three datasets. When the privacy budget is relatively large, e.g. ε = 1.6 = - ln s0, the median relative error is always less than 1%

(3, 3)-biclique counting.
Experimental results of (2, 3)-biclique counting are shown in Fig. 8. As can be seen, on the whole, FDP achieves good accuracy on f2⋈3 over all datasets when the privacy budget is not too small (say ε ≥ 0.2).
When the privacy budget is relatively small, e.g. ε = 0.05 = - ln s0, the error of FDP is one magnitude smaller than ladder for most datasets.
Experimental results of (3, 3)-biclique counting are shown in Fig. 9. It can be observed that the excellent performance of the algorithm (2, 3)-biclique counting is handed down.
From Figs. 7, 8 and 9, we can observe that the accuracy of FDP is several orders of magnitude higher than Laplace. FDP is more accurate than ladder on f⋈ and f2⋈3, especially for datasets which have large differences between degrees of vertices. With ε getting smaller, FDP shows greater advantage than ladder.
In this article, in order to enhance the privacy of output results building on previous work, we put forward two types of novel FDP mechanisms. One is for bounded intervals, named fixed intervals FDP mechanisms, with stronger consistency in output results. The other one is for unbounded intervals, named infinite intervals FDP mechanisms, with a wider range of perturbations. In addition, we obtain an essential property of subgraph counting query, SGS = LS. It means that the calculating of semi-global sensitivities of a query can be simplified by calculating local sensitivities. Our experiments on subgraph counting verify the validity of these novel FDP mechanisms and show the magnitude of the error for different levels of privacy protection. It can be seen that the error in the output results remains low at higher levels of privacy protection.
FDP is a competitive model for privacy protection. Different types of queries can make the privacy protection mechanism different in nature. To accommodate the characteristics of different types of data sets, our ongoing studies will focus on investigating more FDP mechanisms in a variety of scenarios and exploring more types of queries such as histogram query.
Footnotes
Acknowledgments
This work was supported in part by the National Natural Science Foundation of China under Grant 61976168, Grant 61972309, and Grant 61902299, in part by S&T Program of Hebei under Grant 20310102D, in part by XD-Inspur DB Innovation Lab grant, in part by Key Scientific Research Program of Shaanxi Provincial Department of Education under Grant 20JY014,, in part by China Postdoctoral Science Foundation under Grant 2019TQ0239 and Grant 2019 M663636, and in part by scientific research project of Chaohu University (XLY-202105).
Broadly speaking, differential attacks are the study of how differences in information input(i.e. neighboring data sets which differ at one record) can affect the resultant difference at the output.
The isomorphism between a pair of graphs g1 and g2 means that there is an isomorphic mapping between them. Isomorphic subgraphs, in some order, have the same adjacency matrix.
Edge privacy refer to the differential privacy with a pair of neighboring datasets which differ at one edge.
