Sharing-habits based privacy control in social networks

Abstract

We study users behavior in online social networks (OSN) as a means to preserve privacy. People widely use OSN for a variety of objectives and fields. Each OSN has different characteristics, requirements, and vulnerabilities of the private data shared. Sharing-habits refers to users’ patterns of sharing information. Sharing-habits are implied by the communication between users and their peers. While social networks allow users to have some control over the dissemination of their information, most users are not aware that the private information they share might leak to users with whom they do not wish to share it. In this paper we address the growing need of social network users to share information with close fiends while hiding it from others. We apply several different well-known strategies from graph-flow theory to an OSN graph with sharing-habits insights, to control the information flow among OSN users. The goal of the method we present is to allow maximum information sharing while enforcing a user’s pre-defined privacy criteria. Our method is evaluated using real data from well known social networks and the results are analyzed in terms of accuracy and run-time.

Keywords

Privacy control sharing-habits social networks

1. Introduction

Online Social networks (OSN) are websites enabling users to build connections and relationships among each other. The OSN structure represents social relationships between its users. Social networks are widely used by their members for information sharing with the purpose of reaching as many friends as possible. Users should have control over the dissemination of their information, however they are not fully aware of the possible consequences of their preferences when specifying access rules to their shared data. Most access rules are defined in terms of the degree of relationship required to access ones data and are not refined enough to allow dynamic denial of content from certain peers of the community. It is the responsibility of OSN administrators to effectively enforce these rules to reduce the risk of information leakage. For example, consider a chocolatier that owns a small Chocolate boutique shop. The chocolatier has established a customers club as a closed-group in his Facebook account. The chocolatier introduces new tastes, sales opportunities, and coupons to his Facebook closed-group. He invents a new taste for the incoming Valentine’s day, and makes a special offer to his customers club in advance. The chocolatier would like this information to be shared with as many people as possible, but to remain hidden from his competitors (adversaries). Currently the Chocolatier has limited control over the information shared on his Facebook group, each member of the group, can re-post the shared information, that might reach one of his competitors.

We propose a model for access control that works with minimal user intervention. The model is based on users’ patterns of sharing information denoted as Sharing-habits. To minimize the risk of information leakage, the social network is analyzed based on these habits, to determine the probability of information flow through network connections. In a graph representation of the network, where nodes are users and edges indicate relationship between users, the challenge is to select the set of edges that should be blocked to prevent leakage of the shared information to unwanted recipients. We review some methods for handling and preserving privacy in social networks, and present our new privacy preserving approach, based on sharing-habits data. Our model combines algorithms that use graph flow methods such as max-flow-min-cut, and contract. Experimental results show the effectiveness of these algorithms in controlling the flow of information to allow sharing with friends while hiding from others. To the best of our knowledge, this paper is the first to provide dynamic access control model which is based on users dynamic interactions and not on user profiles.

The rest of the paper is structured as follows: in the next section we review related work, in Section 3 we define the privacy assurance in OSN problem, and in Section 4 we present our method for dealing with this problem. We explain our evaluation method and primary results in Section 5 and conclude by summarizing our contribution and discussing directions for future work in Section 6.

2. Related work

There are various types of Online Social Networks, each with different properties. Privacy preservation can be viewed and handled from various aspects. In [12], Papacharissi et al. present several aspects regarding social networks: social network’s structure, evolution, and properties, including identities, communities, and the culture on social network sites. They present self-presentation and social connection in the digital age, behavioral norms, patterns and routines on social networks. The authors conclude that there are emerging patterns of networked sociability that combine newer social habits with old habits, including social routines of the past, and reflect social tendencies and tensions that take shape on networked planes of social activity. Social media platforms introduced a space where boundaries between private and public space have become fuzzy, which opens up new possibilities for identity formation.

Barabasi [1] examines the structure and evolution of social networks. Most networks have the “scale-free” property, which means that they do not depend on a specific node (user), and randomly removing nodes, will not destroy the social network. Every network has a few hubs that hold the whole network together, combining communities that are relatively isolated groups of nodes that work independently. Networks are hierarchical, they are built of communities that are grouped together into bigger communities.

LaRose et al. [14] discuss the social networking addiction, and media habits. Some people develop an obsession with some social network sites, known as “Facebook addiction”. For many Internet users, social networking has become a media habit, as a form of automatism. People develop an automatic habit of media consumption, for some it turned into a “bad” habit, that might be termed compulsive, problematic, pathological, or addictive, and for others it turned into a “good” habit. A new category of mental illness, called “Internet usage disorder” has been proposed, including a subcategory of email/text messaging. Models of media behavior can be extended from an understanding of Internet habits.

Kim et al. [8], investigated the cultural difference in motivations for using social network sites, between American and Korean college students. American and Korean showed a similar pattern of daily use of the social networks. A notable difference between American and Korean was found in the number of connections included in their “friends” list. The number of connections defines the Sharing-habits, it shows whether the user tends to share data. Small or large amount of data indicates his willingness to share data with friends.

Carmagnola et al. [18] present a research about the factors that help user’s identification, and information leakage in social networks, based on entity resolution. They conducted a study on the possible factors that make users vulnerable to identification, and of personal information leakage, and the perception of users about privacy related to the spreading of their public data. To find the risk factors, they studied the relations between the user behavior (habits) on OSNs and the probability of users’ identification.

Kleinberg and Ligett [10] describe the social network as a graph where nodes represent users, and an edge between two nodes indicates that these two users are enemies that do not wish to share information. The problem of information sharing is described as the graph coloring problem, Kleinberg and Ligett [10] analyze the stability of solutions for this problem, and the incentive of users to change the set of partners with whom they are willing to share information.

Tassa and Cohen [2], handle the information release problem, and present algorithms to compute an anonymization of the released data to a level of k-anonymity; the algorithm can be used in sequential and distributed environments, while maintaining high utility of the anonymized data.

Vatsalan et al. [19] survey the ‘privacy-preserving record linkage’ (PPRL) techniques, and overview techniques that allow the linking of databases between organizations while preserving the privacy of these databases. In this paper Vatsalan et al. [19] present a taxonomy of PPRL which characterizes the known PPRL techniques along 15 dimensions. They highlight the shortcomings of current techniques avenues for future research.

Jaehong and Ravi [13] present the ORIGIN CONTROL access control model where every piece of information is associated with its creator forever. As we show later, such a model can be used to control content dependent information flow within the social network.

Ranjbar and Maheswaran [11], describe the social network as a graph where nodes represent users, and an edge between two nodes indicates that these two users are friends that wish to share information. They present algorithms for defining communities of users, where the information is shared among users within the community. They also propose algorithms for defining the set of users that should be blocked in order to prevent the shared information from reaching adversaries, and leaking outside the community. In OSN, communities are subsets of users connected to each other which can be described by a connected graph, where each user is a node in the graph, and an edge connecting two nodes indicates a relationship between two users. A community is defined by Ranjbar and Maheswaran [11] from the view point of an individual user. myCommunity is defined as the largest sub-graph of users who are likely to receive and hold the information without leaking. In other words, myCommunity is the subset of an individual user’s friends with whom it has intense and frequent interactions. It describes a grouping abstraction of the set of users that surrounds an individual based on the communication patterns used for information sharing. Our study is based on the ideas described in their paper however, while [11] only share information within the defined community, and block all users that might leak information to adversaries, and as a result may block friendly users from all information, we relax the limitation defined in their study, and block only edges on the path to the adversaries while minimizing the impact on the user defined community.

3. The privacy assurance in OSN problem

In this section we define the general problem of privacy assurance in OSN and our proposed method that uses information from users sharing-habits.

Let $G = (V, E)$ be a directed graph that describes a social network, where V is the set of network’s users, and E is the set of directed and weighted edges representing the users’ information flow relationships. An edge between two users $(u_{i}, u_{j}) \in E$ exists only if $u_{i}$ shares information with $u_{j}$ .

We define the distance between two vertices in a graph, ${dist}_{G} (u_{i}, u_{j})$ as the length of the shortest path from $u_{i}$ to $u_{j}$ in G. An Ego is an individual focal node, that represents a specific user from which we consider the information flow. A network has as many egos as it has nodes. An ego-community is the set of nodes that consists of an ego and all nodes connected to it at some path length.

The δ-community of a user, represented by the ego vertex $u_{i}$ is the sub-graph $G_{δ} (u_{i}) = (V_{δ} (u_{i}), E_{δ} (i))$ , where for each $v_{i} \in V_{δ} (u_{i})$ , $v_{i} \neq u_{i}$ , ${dist}_{G} (u_{i}, v_{i}) ⩽ δ$ . Let $V_{δ} (u_{i})$ be the set of nodes that consists the ego-node $u_{i}$ and all the nodes $u_{j}$ such that ${dist}_{G} (u_{i}, u_{j}) ⩽ δ$ , $E_{δ} (i)$ is the set of edges on the paths from the ego-node $u_{i}$ to the nodes within $V_{δ} (u_{i})$ where ${dist}_{G} (u_{i}, v_{i}) ⩽ δ$ . The size of the community is determined by the distance from the ego node and also by the density of the graph. Graph density is the ratio between the number of edges and the number of nodes in the graph: $Density = \frac{| E |}{| V | \cdot (| V | - 1)}$ .

Fig. 1.

$u_{i}$ ’s Community Graph.

Figure 1 describes an ego-community graph for the ego node $u_{i}$ . The dotted area surrounds $u_{i}$ ’s δ-community graph where $δ = 4$ , i.e., all acquaintances within distance $⩽ 4$ . The blue area surrounds all $u_{i}$ ’s friends within distance $⩽ 2$ denoted $u_{i}$ ’s β-community where $β = 2$ .

As shown by the figure the δ-community of friends is usually much larger than the β-community of close friends.

We use the following definitions as defined by Ranjbar and Maheswaran [11]: $p_{i}$ is the probability that user $u_{i}$ is willing to share the information with some of his friends. $\begin{matrix} (1) & p_{i} = \{\begin{matrix} (outflow / inflow) & (outflow < inflow), \\ 1 & (outflow ⩾ inflow) . \end{matrix} \end{matrix}$

Outflow is the number of sharing interactions from $u_{i}$ to his friends.

Inflow is the number of sharing interactions from $u_{i}$ ’s friends to $u_{i}$ .

The likelihood of

u_{i}

sharing information with

u_{j}

along the edge

(u_{i}, u_{j})

is represented by the weight on the edge

w_{i, j}

. This weight is derived from the relationship between

u_{i}

and

u_{j}

, it is a fixed number indicating the willingness of

u_{i}

to share information with

u_{j}

. It may be set by the user and usually it does not change. The probability of flow between two neighbor users,

u_{i}

and

u_{j}

is denoted as

p_{i j}

, and calculated by

p_{i, j} = p_{i} \times w_{i, j}

. We assume that the user’s behavior is consistent; A user

u_{i}

shares all the data with user

u_{j}

with probability

p_{i, j}

. This probability can change with time, but it does not depend on the content of the shared information.

For example, if in Fig. 6, the number of sharing interactions from user 4 to his friends is 23 (13 to user 3 and 10 to user 5), and the number of sharing interactions from 4’s friends to 4 is 30 (9 from user 7 and 21 from user $u_{0}$ ), $p_{i}$ will be $23 / 30 = 0.77$ . If $w_{4, 5}$ is 0.8 and $w_{4, 3}$ is 0.92, the resulting $p_{4, 5}$ is 0.62 and $p_{4, 3}$ is 0.71.

The Probability of Information Flow (PIF), is the maximum probability of information flow throughout the entire paths between $u_{i}$ and $u_{j}$ . A path probability flow between $u_{i}$ and $u_{j}$ is the flow of the edge with the minimum $p_{i, j}$ . It is denoted as ${PATH}_{i, j}$ . The PIF is the maximum among of all paths between $u_{i}$ and $u_{j}$ of ${PATH}_{i, j}$ . A flow is a mapping $f : E \to ℜ^{+}$ , denoted by $f (u_{i}, u_{j})$ and computed by using the log of the edges’ probabilities on a path between $u_{i}$ and $u_{j}$ . The value of flow is defined by $| f | = \sum_{u_{i}, u_{j} \in V} log p_{i, j}$ , where $u_{i}$ is the ego node. To prevent information flow from one user to another we search for the minimal set of edges that when removed from the community graph, or blocked, disables the flow. We denote this set of blocked edges as B. Once edges are removed, the PIF and therefore f should be recomputed.

3.1. Problem goal

Our aim is to enable a user $u_{i}$ to share information with as many friends and acquaintances as possible, while preventing information leakage to adversaries within the user’s community. Ranjbar and Maheswaran [11] describe a method for sharing information within the source user $u_{i}$ defined community, while blocking users (friends and acquaintances) that might leak information to adversaries. We relax the limitation due to blocking friends, and instead of blocking all the information from the source user $u_{i}$ to the users that might leak the information, we block only edges on the path from $u_{i}$ to her adversaries. We use the following criteria to define and evaluate the resulting $u_{i}$ ego-community graph:

Minimum Friends Information Flow: the minimum information flow from $u_{i}$ to every user within his community must preserve a certain percentage of the original information flow to every user denoted by α.

Let $G_{δ} (u_{i}) = (V_{δ} (u_{i}), E_{δ} (u_{i}))$ be the δ-community of $u_{i}, v \in V (u_{i})$ $\begin{matrix} (2) & f (u_{i}, v) ⩾ α \cdot f_{original} (u_{i}, v) \end{matrix}$

Close Friends Distance: Close friends are defined by their distance from $u_{i}$ . $G_{β} (u_{i}) = (V_{δ} (u_{i}), E_{δ} (u_{i}))$ is the β-community of $u_{i}, v \in V (u_{i})$ , $β < δ$ . This criteria reflects the requirement that all the users within $u_{i}$ ’s β-community must receive the entire information from $u_{i}$ , and cannot be blocked.

Let B be the set of blocked edges, then $\begin{matrix} (3) & B \subset {(u_{s}, u_{t}) | d_{G_{δ}} (u_{i}, u_{s}) ⩾ β, u_{s}, u_{t}, u_{i} \in V_{G_{δ}} (u_{i})} \end{matrix}$ We assume that there are no adversaries within $u_{i}$ ’s β-community, otherwise the above condition is never fulfilled.

Maximum Adversaries Information Flow: the maximum information flow from $u_{i}$ to each of his adversaries cannot be more than γ from the original information flow to each adversary ( $u_{adv}$ ). $\begin{matrix} (4) & f (u_{i}, u_{adv}) ⩽ γ \cdot f_{original} (u_{i}, u_{adv}) \end{matrix}$

For example the threshold parameters can be:

α = 0.9

β = 2

, and

γ = 0.1

. The problem goal is to remove the least number of edges such that the three inequalities 2,3,4 are satisfied. A detailed example for this process is given in Section 5.1.

3.2. Cuts in a graph

A cut in a graph is a set of edges between two subsets of a graph, one containing $u_{i}$ , and the other containing $u_{i}$ ’s adversaries, such that when removed, prevents information flow from one subset to the other.

A naive algorithm for solving the problem would be an algorithm that finds any cut between the adversaries’ set and $u_{i}$ ’s community, and defines this cut as the blocked edges list. Algorithm 1 is a naive algorithm for blocked users.

The naive algorithm is not suitable for our problem, since it doesn’t address (1) Minimum Friends Information Flow and (2) Close Friends Distancecriteria of our problem. However it serves as a basic framework to our proposed solution.

Algorithm 1

Naive algorithm for blocked users

3.3. Sharing-habits based privacy assurance in OSN: The simple problem

In the Appendix we analyze the complexity of Sharing-habits based Privacy Assurance in OSN (SHPA) problem. We show that a “Minimum-Distance with Maximum-Flow and Minimum Leakage” problem, which is a simple subset of the “Sharing-habits based Privacy Assurance problem” (defined next) is NP-Complete, and so does the general Sharing-habits based Privacy Assurance in OSN problem.

In Section 4 we present a practical approximating solution for the general problem.

3.3.1. Maximum-flow minimum-leakage problem definition

Let $G = (V + u_{i}, E)$ be a directed graph with an ego node $u_{i}$ , which is the source user, and edge capacities/lengths ${c_{e} : e \in E}$ .

Given a set $T \subseteq V$ of users, let $f_{G} (u_{i}, T)$ denote the maximum $(u_{i}, T)$ -flow value under capacities $c_{e}$ .

$T \subseteq V$ is the set of $u_{i}$ ’s friends.

$S \subseteq V$ is the set of $u_{i}$ ’s adversaries.

A particular case of Maximum-Flow Minimum-Leakage Problem is the Minimum-Distance Maximum-Flow Minimum-Leakage Problem, which is a simple subset of our general problem, were the adversaries are in the boundaries of $u_{i}$ ’s δ-community: ${dist}_{G} (u_{i}, u_{j}) ⩽ {dist}_{G} (u_{i}, s)$ for all $(u_{i}, s) \in T \times S$ .

In the Appendix we show the proof for the claim that the Minimum-Distance Maximum-Flow Minimum-Leakage problem is NP-complete and so does the simple problem, and furthermore does the general problem.

In the next chapter we present several approximations algorithms to the general problem.

4. The sharing-habits based privacy assurance in OSN solution

As we show in the Appendix the problem presented here is NP-Complete, next we propose a model for finding the set of edges that should be blocked to allow maximum information sharing with the community of the information source and minimum information leakage. In Section 5.2.2 we analyze the complexity of the proposed model, and show that it is $O (| V |^{2} + | V | \cdot | E | + | E |^{2})$ when using paralleled contract for finding the initial candidates-set for edges to be blocked, or $O (| V | \cdot | E |^{2} + | E |^{2})$ when using min-cut for finding the initial candidates-set for edges to be blocked.

Our model consists of two major steps: the first is the initialization step in which we create a multi-graph with a super-vertex $s_{1}$ containing $u_{i}$ ’s β-community, this step is described in Section 4.1. In the second step we present two methods for identifying candidate sets of edges to be blocked as described in Section 4.2.

Algorithm 2 is a skeleton outlines these steps: It starts with the initialization step (lines 1–8), next it calls the procedure that finds candidates-sets of edges to be blocked (line 12), by using min-cut algorithm, contract algorithm, or both, and then it calls the procedure that examines the proposed set against the required privacy criteria (line 13). Figure 2 describes the main building blocks of the algorithm for defining the edges to be removed from $u_{i}$ ’s δ-community in order to prevent information leakage to $u_{i}$ ’s adversaries.

Algorithm 2

Construct blocked edges

Fig. 2.

Construct Blocked Edges main building blocks.

Next we describe in detail each one of these building blocks.

4.1. Initialization

The δ-community of a member $u_{i}$ consists of all users $u_{j}$ connected to $u_{i}$ with a path of distance $⩽ δ$ . The β parameter defines the size of the community of close friends. Therefore, a β-community of $u_{i}$ would be a sub-graph contained in the δ-community where $β ⩽ δ$ , as demonstrated in Fig. 1. The privacy criteria presented in Section 3.1 requires that the entire information shared by $u_{i}$ is shared with $u_{i}$ ’s close friends (2). To comply with this requirement, the Initialization step creates a multi-graph with one super-vertex $s_{1}$ containing $u_{i}$ and her close friends with distance $⩽ β$ . This step ensures that no edge on a path between $u_{i}$ and her close friends will be blocked since they all belong to the same super vertex, $s_{1}$ (see Fig. 3).

Figure 3(a) describes a δ-community graph for $u_{0}$ , $δ = 3$ , with 10 members, 4 close friends with distances $= 1$ (blue vertices), 4 acquaintances (green vertices), and 2 adversaries (red vertices). Figure 3(b) describes the graph after initialization. The initialization process is depicted in steps 1–8 of the algorithm.

4.2. Construct blocked edges candidates

A candidate-set of blocked edges is a cut between two sets of vertices, one set containing $u_{i}$ , $u_{i}$ ’s β-community, and some vertices from of $u_{i}$ ’s δ-community. The other set containing the remaining part of $u_{i}$ ’s δ-community, and $u_{i}$ ’s adversaries.

The candidate-set is evaluated against the privacy criteria we have defined in Section 3 and is described later in Section 4.3. We use the following two methods for finding the initial candidate-set of edges to block:

Min-Cut: based on Ford–Fulkerson [4], Max-flow-min-cut algorithm, we find the minimum cut between the super-vertex $s_{1}$ and each of $u_{i}$ ’s adversaries.

Contract: based on Karger et al. [17], contract algorithm, we find any cut between the super-vertex $s_{1}$ and each of $u_{i}$ ’s adversaries.

4.2.1. Block edges by min-cut

Algorithm 3 implements the Sharing-habits privacy assurance based on the max-flow min-cut method [4], and then tests for privacy criteria compliance:

Find a minimum cut between super-vertex $s_{1}$ and $u_{i}$ ’s adversaries [4].

Check if the cut complies with the required privacy criteria as defined in Section 3.1, and select the final candidates-set. This process is described in Section 4.3.

Algorithm 3

Block edges by Min-Cut

4.2.2. Block edges by contract

The minimum cut between the beta community of user $u_{i}$ , $G_{β} (u_{i})$ , and $u_{i}$ ’s adversaries, found by BlockEdgesByMinCut algorithm, might not be the optimal solution for our problem, since the edges in this cut may not satisfy the privacy criteria. We therefore apply the contract algorithm, to find a variety of other cuts possibly complying with this criteria.

Algorithm 4 implements the Sharing-habits privacy assurance based on the contract method by Karger and Stein [6,17].

In each iteration, the contract algorithm finds a different cut between the super-vertex containing $G_{β} (u_{i})$ and the super-vertex containing $u_{i}$ ’s adversaries. The contract algorithm repeatedly contract vertices to super-vertices until it gets two super-vertices connected by a set of edges that defines a cut between the two sets of vertices contained in each super-vertex.

Note:

The contract algorithm may be called many times until the resulting cut complies with the required privacy criteria, as defined in Section 3.

When repeated enough times the contract will find the min-cut.

Algorithm 4 is composed of the following main steps:

Find a cut between super-vertex $G_{β} (u_{i})$ and $u_{i}$ ’s adversaries.

Check if the cut complies with the required privacy criteria as defined in Section 3.1 and select the final candidates-set. This process is described in 4.3.

Algorithm 4

Block edges by Contract

Algorithm 5

ContractFindCut

Algorithm 5 is called by Algorithm 4 to find a cut between two vertices by randomly selecting an edge and contracting the two vertices connected by the selected edge into one super-vertex.

Figures 4–5 describe a simple community graph and some steps of one run of the contract algorithm.

Fig. 3.

$u_{0}$ ’s δ-community graph: (a) $u_{0}$ ’s community (b) after initialization.

Fig. 4.

Contract: (a) Edge $(5, 10)$ was randomly selected, (b) Edge $(5, 2)$ cannot be selected, since the algorithm cannot contract a super-vertex containing $u_{0}$ with a super-vertex containing $u_{0}^{'} s$ adversary.

Fig. 5.

Contract: (a) Edge $(3, 7)$ is randomly selected (b) The obtained cut from one run of Contract algorithm.

4.3. Compute final candidates set

After selecting the initial candidates-set of edges to block, each method uses Algorithm 6 for selecting the final candidates-set of edges that should be removed from $u_{i}$ ’s δ-community graph. In the first step of the algorithm, we check if by removing the initial-candidates-set of edges from $u_{i}$ ’s δ-community graph, the remaining δ-community graph for user $u_{i}$ complies with the required privacy criteria. If it doesn’t comply, we try to remove edges from the initial blocked candidates-set, and insert them back into $u_{i}$ ’s δ-community graph, until the remaining community graph complies with the required criteria, or until we tested the entire edges in the initial candidate-set, and couldn’t find a set of edges to be blocked.

The contract algorithm may provide at each run different cuts with different sets of edges to be blocked, the optimal set of edges to be blocked may vary in terms of the number of blocked edges, the amount of information flow to acquaintances and the amount of information leakage to adversaries. The order of selecting the edges to be blocked may provide different acceptable solutions to our problem. We propose three methods for selecting and removing an edge from the initial candidates-set, and insert the selected edge back to δ-community graph:

Randomize: select an edge randomly.

Maximum PIF: select the edge with the maximum probability of information flow.

Minimum PIF: select the edge with the minimum probability of information flow.

The motivation for the second choice is to reduce maximum flow to adversaries. The motivation for the third choice is to increase flow to friends. It is hard to find a balanced set since the general problem is NP-complete.

Algorithm 6 implements the three methods and Algorithm 7 tests the criteria.

Algorithm 6

Compute final candidates-set

Algorithm 7

Compute the required criteria

5. Evaluation

In this section we describe the evaluation method we use for the proposed algorithm, and the results we obtained using real data [9]. We first demonstrate our methods and the difference between them using a toy community.

5.1. Demonstration using a synthetic community

We demonstrate our algorithms on a small graph representing a community based on the example given in [15], containing 11 vertices and 23 edges with the following community parameters: distance $δ = 3$ , close friends distance $β = 1$ , and 2 adversaries. The algorithms are tested with different probabilities of information flow from source user $U_{0}$ to the community members.

Figure 6 describes the synthetic community graph with high probability of information flow on the edges to adversaries. This situation simulates a collision, for which it is hard to find a cut other that trivially cut the edges to adversaries. In these cases a cut will be found only for privacy criteria that allows very low levels of information flow to $u_{0}^{'} s$ community (set by α), and/or very high levels of leakage of information flow to $u_{0}^{'} s$ adversaries (set by γ).

In Fig. 6 $U_{0}$ is the source, $U_{0}$ has four close friends: $1, 2, 3, 4$ , four acquaintances: $5, 6, 7, 8$ , and two adversaries: $9, 10$ .

Fig. 6.

Synthetic community graph with collision.

Adversary 9 has three incoming edges ${(6, 9), (5, 9), (8, 9)}$ with probabilities $(0.19, 0.95, 0.8)$ respectively.

Adversary 10 has also three incoming edges ${(5, 10), (7, 10), (8, 10)}$ with probabilities $(1, 0.85, 0.95)$ respectively.

The maximum probability of information flow from $U_{0}$ to all other members in the graph is depicted in Table 1.

Table 1

PIF from $U_{0}$ to his community

User	1	2	3	4	5	6	7	8	9	10
MAX PIF	0.76	0.62	0.43	0.67	0.4332	0.4154	0.2949	0.4281	0.4115	0.4332

Next, using this example we show why the contract approach has better chance of finding a good set of edges that can be blocked while satisfying the privacy criteria.

Table 2

Candidates found by Min-Cut

Edge	PIF
$(5, 9)$	0.95
$(6, 9)$	0.19
$(8, 9)$	0.8
$(3, 8)$	0.9
$(3, 7)$	0.62
$(5, 10)$	1

Table 3

Candidates found by Contract

Edge	PIF
$(4, 5)$	0.62
$(1, 5)$	0.57
$(2, 6)$	0.67
$(3, 8)$	0.9
$(7, 10)$	0.85

Table 4

Candidates found by Contract

Edge	PIF
$(5, 9)$	0.95
$(6, 9)$	0.19
$(8, 9)$	0.8
$(7, 10)$	0.85
$(5, 10)$	1
$(8, 10)$	0.95

5.1.1. Block edges by min-cut method

The Minimum cut found by Min-Cut method is depicted in Table 2.

If we remove the initial candidates-set edges from $u_{0}^{'} s$ community graph, the probability of information flow to 7 and 8 will be 0, meaning no flow at all. In the final step of Algorithm 6, we try to unblock each edge from the initial candidates-set, to reach the required privacy criteria; in this example the only edge that improves the probability of information flow (PIF) to the community without increasing the information leakage to $u_{0}^{'} s$ adversaries is $(3, 7)$ , thus the final candidates-set is ${(3, 7)}$ . However if we block information flow from $(3, 7)$ , the max information flow to 7 is obtained through 8 and since according to Table 1, the maximum flow to 8 is 0.428, the maximum flow to 7 is: $0.428 * 0.29 = 0.124$ , This is clearly a very low value of information flow to community members.

5.1.2. Block edges by contract method

Two Cuts found by iterations of contract method are depicted in Tables 3 and 4. If we remove the initial candidates-set edges depicted in Table 3, the probability of information flow to 5, 6, and 8 will be 0, meaning no flow at all and this does not comply with the required privacy criteria. Therefore following Algorithm 6 we unblock each edge from the initial candidates-set, until we meet the required privacy criteria; the final candidates-set is empty, since each edge we unblock not only improves the information flow to $u_{0}^{'} s$ community, but also increases the information leakage to $u_{0}^{'} s$ adversaries. In this case since users $5, 7$ , and 8 are highly sharing information with the adversaries, the only reasonable cut must include edges to the adversaries.

It is obvious that when the edges to the adversaries have high probabilities, the max-flow-min-cut methods might not select those edges, and might not find a solution that comply with the required privacy criteria, while the contract method might find the trivial cut that contains only the edges to the adversaries, as depicted in Table 4, and thus comply with the required privacy criteria.

5.2. Test on SNAP database

We evaluated our algorithms using real data from Facebook, Twitter and Gplus networks data, available from Stanford Large Network Data-set Collection (SNAP) [9]. The SNAP library is being actively developed since 2004 and is organically growing as a result of Stanford research pursuits in analysis of large social and information networks. The website was launched in July 2009.

The first social network graph describes social circles from Facebook (anonymized) and consists of 4,039 nodes (users), and 88,234 edges. The social network graph of Twitter, is a directed graph describing social circles from Twitter which consists of 81,306 nodes (users), and 1,768,149 edges. The social network graph of Gplus, is a directed graph describing social circles from Google+ which consists of 107,614 nodes (users), and 13,673,453 edges.

In real world, the user’s willingness to share information with friends and acquaintances may be set periodically as described in Section 3, or by using learning techniques on user’s sharing habits. Since interactions between members are not reported in the SNAP datasets, we use the structure and relationship from this database, and assign random probabilities to the edges in the network graph as described next.

We define four types of users, to express a user’s willingness to share information: very high, medium, low, and very low. For each user in the graph we randomly assign a type. For each edge from a user node we randomly assign a probability that conforms to the user’s type according to the following ranges: high (0.75–1), medium (0.5–0.75), low (0.25–0.5), very low (0–0.25). The four types are generated uniformly among all network users. In real life these probabilities can be learned from the traffic patterns and users sharing habits.

In each run a random node is selected as the ego-node, the α, γ and δ parameters are set randomly and the β parameter is set to 1. A random number of adversaries (between 1% and 5% of community vertices) are selected from the δ-community of the ego-node. We used three methods for selecting the final candidate-set of edges to block: edges with max PIF (Probability of Information Flow), edges with Min PIF, and random edges. All three methods yield similar results, thus we only present the results from using the max PIF method.

5.2.1. Evaluation results

Figures 7–14 present several sub-communities derived from Facebook database by randomly selecting a sharing user with $δ = 4$ , and the results obtained by our algorithms. From these figures we can see that the databases are shaped as clusters of sub-communities connected together into bigger clusters. The users in these figures are colored according to their types, the ego-user is colored in steel blue, first degree friends are colored in turquoise, the adversaries are colored in coral and community members are colored in gray. The red lines depict the initial cut edges and the thick red lines depict the final cut edges, $V = Vertices, E = Edges, D = Density$ . Tables 5–6 summarize the results of four different evaluation runs, for different communities.

Figure 7 describes two communities derived from Facebook database. The two communities have the same order of vertices, however we can see a clear division to clusters of vertices in community (a) which has a larger number of edges (connections), and consequently higher density.

Fig. 7.

Facebook: (a) $V = 987$ , $E = 61831$ , $D = 0.06353$ (b) $V = 789$ , $E = 5205$ , $D = 0.00837$ .

Fig. 8.

Initial Contract edges, sparse.

Figure 8 is a zoom into the sub-community depicted in Fig. 7(b), with two adversaries, 182, colored with green, and 209, colored with magenta. The ego-user is marked with red circle, and the red lines depict the initial candidates-set of edges to be blocked, found by the contract algorithm.

Fig. 9.

Final Contract edges, sparse.

In Fig. 9 the red lines represent the final candidates-set of edges to be blocked, found after computing the required privacy criteria and unblocking edges from the initial candidates-set of edges depicted in Fig. 8. We can see that the edges to be blocked, are edges on the path to adversary 209 (colored with magenta) only; there is no need to block edges in the paths to adversary 182 (colored with green) since it is connected to the sub-community with one edge from friend 2 to adversary 182 and the probability of sharing information along that edge is very small, 0.0732.

Fig. 10.

Twitter: (a) $V = 75$ , $E = 151$ , $D = 0.0272$ (b) $V = 58$ , $E = 120$ , $D = 0.0363$ .

Figures 10 and 11 describe communities derived from Twitter database, the ego-user is colored with steel blue, first degree friends are colored with turquoise, the adversaries are colored with coral, and community members are colored with gray. We can see that Twitter database is built from clusters of star-communities, generally connected by a small amount of edges.

Fig. 11.

Twitter: (a) $V = 13$ , $E = 26$ , $D = 0.16666$ (b) $V = 11$ , $E = 9$ , $D = 0.08181$ .

Fig. 12.

Twitter: (a) Initial Contract edges (b) Final Contract edges.

Figure 12 is the same sub-community as Fig. 10(a), with one adversary colored coral (66). The ego-user is marked with red circle, and the red lines depict the initial and final candidates-set of edges to be blocked, found by the contract algorithm.

Figure 13 describes two sub-communities derived from Google Plus database, we can see that Google Plus database resembles Facebook database and is built from clusters sub-communities, generally connected by a small amount of edges.

Fig. 13.

GPlus: (a) $V = 113$ , $E = 886$ , $D = 0.07$ (b) $V = 94$ , $E = 149$ , $D = 0.1704$ .

Fig. 14.

GPlus: (a) Initial Contract edges (b) Final Contract edges.

Figure 14 is the same sub-community presented in Fig. 13(a), with one adversary, 106, colored with coral. The ego-user is marked with red circle, and the red lines depict the initial and final candidates-set of edges to be blocked, found by the contract algorithm.

Tables 5–6 summarize the results of several different evaluation runs, for different communities.

Table 5 presents several runs with the different sub-communities derived from the SNAP database. The community size is derived from the sharing user (ego node) and friends column refers to the amount of first degree friends of this user.

Table 5

The size of the tested communities, $δ = 4$

Run	Database	Density	δ	Friends	Adversaries	Vertices	Edges
1	Facebook	0.0087	4	15	2	334	968
2	Facebook	0.00226	4	26	3	1036	2428
3	Facebook	0.00308	4	40	10	1495	6886
4	Facebook	0.0889	4	29	2	206	3755
5	Facebook	0.06353	4	36	1	987	61831
6	Facebook	0.00837	4	68	2	789	5205
7	Facebook	0.00328	4	24	2	789	2038
8	Facebook	0.00938	4	24	1	1813	30821
9	Facebook	0.00436	4	7	1	345	518
10	Twitter	0.0272	4	1	1	75	151
11	Twitter	0.0363	4	3	1	58	120
12	Twitter	0.1	4	1	1	15	21
13	Twitter	0.125	4	3	1	9	9
14	Twitter	0.1666	4	3	1	13	26
15	Twitter	0.08181	4	2	1	11	9
16	Gplus	0.07	4	7	1	113	886
17	Gplus	0.1704	4	7	1	94	149
18	Gplus	0.03945	4	1	1	48	89
19	Gplus	0.0625	4	2	1	16	16

Table 6 present the results obtained by nineteen runs, with close friends distance, β set to 1.

Columns 2–3 present the threshold parameters used for each run. For each community graph we performed the algorithms with medium thresholds ( $α = 0.5, γ = 0.5$ ), and with random thresholds. Columns 5–6 and 7–8 present the initial and final set of edges to be blocked as found by min-cut and contract algorithm respectively. The remark indicates which kind of edges are the candidates for blocking. We can see that when the adversaries are close to the community’s boundary ( $δ = 4$ ), and no maximum path to a community member passes through an adversary vertex (e.g., runs 1, 2, 5, 6, 15, 17) the solution is trivial and the blocked edges are the edges from community members to the adversaries.

When the adversary is in the middle of the community’s boundaries, and there is a path with maximum information flow to a community member that passes through an adversary vertex (e.g., runs 8, 9, 10 and 14), it is highly likely that the maximum information flow will be reduced considerably and therefore no good solution can be obtained for high levels of α. When the ego-user rarely shares data with community member, there is no need to block edges, or the solution is the trivial solution (e.g. runs 7, 9, 13, and 19).

While both algorithms are complete, in the non trivial cases, min-cut finds the the best solution in terms of flow ratio between the initial and final flow to community members (e.g runs 16 and 18). Contract, on the other hand, may find a good solution that blocks adversaries to the extent required by the blocking criteria and even allows more sharing with friends (e.g., runs 5 and 8).

Table 6

Evaluation Runs Results

Run	α	γ	Density	MinCut Initial Edges	MinCut Final Edges	Contract Initial Edges	Contract Final Edges	Remark
1	0.5	0.5	0.0087	2	0	7	7	Blocked edges to adversary
1	0.783	0.5654	0.0087	2	2	7	7	Blocked edges to adversary
2	0.5	0.5	0.00226	2	2	2	2	Blocked edges to adversary
2	0.9587	0.5506	0.00226	2	2	2	2	Blocked edges to adversary
3	0.5	0.5	0.00308	29	29	5	0	MinCut blocked edges to adversary,
3	0.8385	0.1065	0.00308	29	29	5	0	MinCut blocked edges to adversary,
4	0.5	0.5	0.0889	2	0	10	6	Blocked mixed edges
4	0.7776	0.4436	0.0889	2	0	63	43	Blocked mixed edges
5	0.5	0.5	0.06353	39	0	2	0	No edges to be blocked
5	0.4292	0.0226	0.06353	39	35	2	2	Blocked edges to adversary
6	0.5	0.5	0.00837	2	1	4	1	Blocked edges to adversary
6	0.9251	0.4224	0.00837	2	1	4	1	Blocked edges to adversary
7	0.5	0.5	0.00328	2	0	18	0	No edges to be blocked, ego-user rarely shares data
7	0.2047	0.746	0.00328	2	0	18	0	No edges to be blocked, ego-user rarely shares data
8	0.5	0.5	0.00938	16	0	2	0	Blocked mixed edges, adversary is in the middle of sub-community, users frequently share data with the adversary
8	0.0764	0.679	0.00938	16	15	2	2	Blocked mixed edges, adversary is in the middle of sub-community, users frequently share data with the adversary
9	0.5	0.5	0.00436	1	0	1	0	No edges to be blocked, ego-user rarely shares data
9	0.7397	0.9392	0.00436	1	0	1	0	No edges to be blocked, ego-user rarely shares data

Table 6

(Continued)

Run	α	γ	Density	MinCut Initial Edges	MinCut Final Edges	Contract Initial Edges	Contract Final Edges	Remark
10	0.5	0.5	0.0272	1	1	15	9	MinCut blocked edges to adversary, Contract blocked edges to community members
10	0.797	0.1729	0.0272	1	1	4	3	MinCut blocked edges to adversary, Contract blocked edges to community members
11	0.5	0.5	0.0363	2	2	4	3	MinCut blocked edges to adversary, Contract blocked mixed edges
11	0.9601	0.3916	0.0363	2	2	7	0	MinCut blocked edges to adversary,
12	0.5	0.5	0.1	1	1	4	1	MinCut blocked edges to adversary, Contract blocked edges to community members users frequently share data with the adversary
12	0.8672	0.7831	0.1	1	0	4	0	No edges to be blocked, users frequently share data with the adversary high γ value, no need to block edges
13	0.848	0.2021	0.125	1	0	1	0	No edges to be blocked, users rarely share data with adversaries
13	0.1135	0.1099	0.125	1	1	1	1	Blocked edges to adversary, users rarely share data with adversaries
14	0.5	0.5	0.1666	3	0	4	0	No edges to be blocked, adversary is in the middle of sub-community, users frequently share data with the adversary
14	0.1814	0.7266	0.1666	3	0	2	2	Blocked edges to adversary
15	0.5	0.5	0.08181	1	1	1	1	Blocked edges to adversary
15	0.6182	0.2631	0.08181	1	1	1	1	Blocked edges to adversary
16	0.5	0.5	0.07	2	2	21	12	MinCut blocked edges to adversary, contract blocked edges to community members
16	0.7411	0.0402	0.07	2	2	104	26	MinCut blocked edges to adversary, contract blocked edges to community members
17	0.5	0.5	0.1704	1	1	1	1	Blocked edges to adversary
17	0.8077	0.4405	0.1704	1	1	1	1	Blocked edges to adversary
18	0.5	0.5	0.03945	3	3	3	0	MinCut blocked edges to adversary,
18	0.8614	0.151	0.03945	3	3	6	3	MinCut blocked edges to adversary, contract blocked edges to community members
19	0.5	0.5	0.0625	1	0	1	0	No edges to be blocked users rarely share data with the adversary
19	0.9827	0.0816	0.0625	1	1	1	1	Blocked edges to adversary

Executing the contract algorithm multiple times ensures that each time a different set of initial candidate-edges is selected and may result in different final cuts. This way we avoid cases such as 3 where contract result in no solution.

We evaluated our algorithms on various databases with random ego-nodes, community distance, and information sharing probabilities. When the density of δ-community graph for the selected ego-node is high we got better results using the contract method in terms of number of blocked edges. When the min-cut method finds a solution it blocks a larger amount of edges (users) than the contract method (e.g., runs $3, 5$ using Facebook database, and run 8 using twitter database). The results are discussed in terms of efficiency, i.e. CPU time, and Quality, i.e. blocked flow to community members and leakage of flow to adversaries, and the ability to find a solution. In average $54 %$ of the initial candidate set found by min-cut did not lead to a solution that complies with the required privacy criteria, while only $34 %$ in average of the initial candidate set found by contract did not lead to a solution that complies with the required privacy criteria.

Figure 15 presents the CPU time results obtained by our algorithms for the first phase that builds the initial candidate set on sub-communities with different densities as described in Section 4.2. The sub-communities were derived from Facebook database by randomly selecting a sharing user with $δ = 4$ .

Fig. 15.

Facebook sub-community: (a) Initial Min-Cut CPU (b) Initial Contract CPU.

Although we did not optimize our algorithms with respect to CPU time and memory usage, both algorithms run very fast and can be used in real world networks. For example, using a graph with density 0.0635 (987 vertices and 61831 edges), the initial contract run time is 34.86 seconds, and min-cut run time is 35.85 seconds. The run time of the final step, the compute criteria, varies according to the size of the initial cut: 0.045 seconds for a cut with 35 edges, and 19.79 seconds for a cut with 286 edges. The CPU time for both algorithms on sub-community graphs with low density is very low and almost the same. When running our algorithms on sub-communities with higher density, the Min-cut algorithm CPU time grows faster than the contract algorithm CPU time. The contract CPU time depends not only on the density of the graph but also on the probability of flow on the edges of the graph, thus each run may provide different CPU time that is bounded to $O (| E | log | E |)$ . The contract algorithm can be optimized and paralleled to obtain better results as described in [17]. In some cases, as the graph grow denser, contract deteriorate faster than min-cut. However, in most cases the cuts found by contract lead to solution while the cut found by min-cut does not.

Fig. 16.

%CPU of the two main part of the algorithm.

Table 7

CPU % Per Phases

Vertices	Edges	Density	Cut Method	Initial Cut CPU %	Final Cut CPU %	Initial Set	Final Set
787	525	0.0008487	Min-Cut	64.01%	35.99%	2	0
787	525	0.0008487	Contract	70.34%	29.66%	2	0
968	61831	0.0660548	Min-Cut	57.71%	42.29%	39	0
968	61831	0.0660548	Contract	70.34%	29.66%	174	164
205	3749	0.0896461	Min-Cut	66.75%	33.25%	19	16
205	3749	0.0896461	Contract	99.57%	0.43%	4	4

Figure 16 and Table 7 demonstrate the relative CPU time of each of the two main parts of the algorithm: find initial candidate set and compute criteria. The runs presented were configured with $α = 0.25$ and $γ = 0.25$ , however similar results were obtained from all other configurations. The major part of CPU time is required for finding the candidate set of edged. Once a candidate set is found the time required to compute the final cut that meets the criteria, depends mainly on the size of the cut if a solution exists. When no solution exists even for a small candidate set the relative CPU time is large due to the exhaustive search for a solution.

5.2.2. Complexity

The algorithm we propose is composed of three major steps: the first is the initialization step that creates a multi-graph with a super-vertex $s_{1}$ containing $u_{i}$ ’s β-community, the second step finds the candidates-sets for blocked edges, and the last step evaluates the candidates sets of edges and constructs the final set of edges to be blocked.

In the initialization step, the BFS (Breadth-First Search) traversal algorithm is used starting at the ego-node. The time complexity is $O (| V | + | E |)$ , since in the worst case, every node and every edge will be explored. $O (| E |)$ may vary between $O (1)$ and $O (| V |^{2})$ , depending on how sparse the graph is.

For the second step we use two methods derived from flow problems, to find the initial candidates-set of edges to be blocked. The candidate set is actually a cut between super-vertex $s_{1}$ that contains $u_{i}$ and his close friends, and each of $u_{i}$ ’s adversaries. Min-Cut which is based on Ford–Fulkerson [4] Max-flow-min-cut algorithm, finds the cut with the minimal flow value, and Contract which is based on Karger et al. [17] contract algorithm, finds any cut.

We implemented the Edmonds-Karp [7] algorithm for finding the initial candidates set by Min-Cut, implying time complexity of $O (| E |^{2} | V |)$ .

The contract algorithm is a randomized algorithm that repeatedly contract vertices to super-vertices, until it gets two super-vertices connected by a set of edges that defines a cut between the two sets of vertices contained in each super-vertex. The algorithm randomly selects an edge $(u, v)$ and merges the nodes u and v into one super-vertex, reducing the total number of nodes of the graph by one. All other edges connecting either u or v are re-attached to the merged node producing a multi-graph. Each iteration of the contract algorithm finds a different cut between the super-vertex containing $u_{i}$ ’s β-community and the super-vertex containing $u_{i}$ ’s adversaries.

When using permutations to define the order for selecting edges for contraction, one iteration takes $O (| E | log | E |)$ [6].

If the algorithm is repeated $O (| V |^{2} {log}^{3} | V |)$ times, it finds the minimum cut in some iteration [17]. The algorithm is strongly polynomial, and can be paralleled to run with $O (| V |^{2})$ using $| V |^{2}$ processors. However, as we learn from the evaluation, most of the time we do not need the minimum cut solution either because it does not comply with the privacy criteria or because the contract provides reasonable solution after a small number of iterations.

The final step of our algorithm attempts to remove edges from the candidates-set of edges to be blocked, as long as the remaining δ-community graph for user $u_{i}$ complies with the required privacy criteria. We use BFS to find the maximum flow from $u_{i}$ to each node in $u_{i}$ ’s δ-community, and check if it complies with the required privacy criteria, using the original flow, α and γ thresholds. If it doesn’t comply with the required privacy criteria, we try to remove edges from the initial blocked candidates-set, and insert them back into $u_{i}$ ’s δ-community graph, until the remaining community graph complies with the required criteria, or until we tested the entire edges in the initial candidate-set, and couldn’t find a set of edges to be blocked. In each iteration we insert one edge back to the graph, thus in the worst case we have $| E |$ iterations. The BFS complexity is $O (| V | + | E |)$ , and in the worst case we have $| E |$ iterations, thus the total complexity is $O (| V | \cdot | E | + | E |^{2})$ .

As we show in the Appendix the problem presented is NP-Complete, the overall complexity of our proposed model is $O (| V |^{2} + | V | \cdot | E | + | E |^{2})$ when using paralleled contract for finding the initial candidates-set for edges to be blocked, or $O (| V | \cdot | E |^{2} + | E |^{2})$ when using min-cut for finding the initial candidates-set for edges to be blocked.

Table 8 summarizes the overall complexity of our algorithms.

Table 8
Algorithm’s complexity

Method Initialization Find Candidates Evaluate and Find Final Overall Complexity

Min-Cut $O (| V | + | E |)$ $O (| E |^{2} | V |)$ $O (| V | \cdot | E | + | E |^{2})$ $O (| V | \cdot | E |^{2} + | E |^{2})$

Contact – one iteration $O (| V | + | E |)$ $O (| E | log | E |)$ $O (| V | \cdot | E | + | E |^{2})$ $O (| E | log | E | + | V | \cdot | E | + | E |^{2})$

Contact – paralleled using $| V |^{2}$ processors $O (| V | + | E |)$ $O (| V |^{2})$ $O (| V | \cdot | E | + | E |^{2})$ $O (| V |^{2} + | V | \cdot | E | + | E |^{2})$

Method	Initialization	Find Candidates	Evaluate and Find Final	Overall Complexity
Min-Cut	$O (\| V \| + \| E \|)$	$O (\| E \|^{2} \| V \|)$	$O (\| V \| \cdot \| E \| + \| E \|^{2})$	$O (\| V \| \cdot \| E \|^{2} + \| E \|^{2})$
Contact – one iteration	$O (\| V \| + \| E \|)$	$O (\| E \| log \| E \|)$	$O (\| V \| \cdot \| E \| + \| E \|^{2})$	$O (\| E \| log \| E \| + \| V \| \cdot \| E \| + \| E \|^{2})$
Contact – paralleled using $\| V \|^{2}$ processors	$O (\| V \| + \| E \|)$	$O (\| V \|^{2})$	$O (\| V \| \cdot \| E \| + \| E \|^{2})$	$O (\| V \|^{2} + \| V \| \cdot \| E \| + \| E \|^{2})$

6. Conclusions

The problem of uncontrolled information flow in social network is a true concern to ones privacy. In this paper we address the need to follow the social trend of information sharing while enabling the owner to prevent their information from flowing to undesired recipients. The goal of the suggested method is to find the minimal set of edges that should be excluded from ones community graph to allow sharing of information while blocking adversaries. To reduce side effect of limiting legitimate information flow, we minimize this impact according to the flow probability.

One of the main purposes of our evaluation was to compare the solutions obtained by the different algorithms. Based on our experiments we can conclude that except for cases where adversaries are connected with very few edges, the solutions acquired by Mincut are inferior to those obtained with the Contract algorithm. Furthermore contract is usually more efficient even if run for several iterations.

Our algorithms can be used within the ORIGIN CONTROL access control model [13]. In this model every piece of information is associated with its creator forever. The set of cut edges found by our algorithms, is stored for each user when the data is released and can be checked when the origin controlled information is accessed. This way the administrator can check whenever this information is accessed by a certain user, if the edge between them was cut for the originator user, and thus prevent the information to pass through that edge to that certain user.

Optimizations. In the last part of our algorithm we check if by removing all initial-candidates-set of edges from $u_{i}$ ’s δ-community graph, the remaining δ-community graph of user $u_{i}$ complies with the required privacy criteria. If not, we try to insert the removed edges back into $u_{i}$ ’s δ-community graph, until the remaining community graph complies with the required criteria, or until we tested the entire edges in the initial candidate-set, and couldn’t find a set of edges to be blocked. In each iteration we insert back one edge to the graph, and compute the maximum flow from $u_{i}$ to all members in $u_{i}$ ’s δ-community. In the worst case we have $| E |$ iterations, each computes the maximum flow from $u_{i}$ to all members in $u_{i}$ ’s δ-community. Instead of computing the shortest path each time, we can examine methods for building accumulative paths to calculate the best path from $u_{i}$ to all members in $u_{i}$ ’s δ-community to improve the average computation time.

Other extensions. In this work we used two approaches to identify the set of edges to be blocked, the max-flow-min-cut method, and the contract method that finds any cut between two sets in a graph.

In future work these algorithms can be extended in several ways. One approach is to use k-shortest-paths as the source for edges to be blocked. In this method one can start by finding the k-shortest path from the ego-node (source) to the adversary (sink), and set the edges on the first shortest-path as candidates for blocking. If the required privacy criteria is not achievable, the second shortest path is set as the initial candidates set, etc. Another approach is to set the combination of all k-shortest paths from the ego-node to the adversaries as the initial candidates set of edges for blocking. Node centrality can be use to select edges from central nodes to adversaries as the initial candidates set

Another challenge is to automatically identify $u_{i}$ ’s adversaries, by fields of interest, joined groups, posts or pre-defined characteristics.

Finally, this model of controlling flow needs to be integrated not only within the ORIGIN CONTROL access control model [13], but with other models of privacy in social networks such as the models described in [20].

Footnotes

The Sharing-habits based Privacy Assurance in OSN problem – proof

Here we present the proof for the claim that the Sharing-habits based Privacy Assurance in OSN problem is NP-complete. We start by showing that the Maximum-Flow Minimum-Leakage Problem is a Hitting-Set-hard.

A p -approximation algorithm for a minimization problem runs in polynomial time and produces a feasible solution of value at most p times the value of an optimal solution; if such an algorithm exists then we will say that the problem admits approximation ratio p . If such an algorithm is unlikely to exist (e.g., if existence of such an algorithm implies $P = NP$ ) then we will say that the problem has approximation threshold p .

Given a subset A of nodes of a graph J, let $Γ_{J} (A)$ denote the set of neighbors of A in J. The following known problem is NP-hard, and moreover, is known to have a logarithmic approximation threshold.

Garey and Johnson showed in [5] that the Hitting-Set problem is NP-hard by showing that the hitting-set problem is equivalent to the vertex-cover problem.

The Vertex-cover problem. Consider a bipartite graph, with vertices on the left representing sets, vertices on the right representing the universe of elements, and edges representing the inclusion of elements in sets. The objective is to find a minimum cardinality subset of left-vertices which covers all right-vertices.

In the hitting-set problem, the objective is to cover the left-vertices using a minimum subset of the right vertices; by interchanging the two sets of vertices we convert the bipartite graph problem into the hitting-set problem.

Let $n = | V |$ , we prove that Minimum-Distance Maximum-Flow Minimum-Leakage Problem is Hitting-Set-hard to approximate, and in particular obtain the result showed by Feige [3].

Given an instance of Hitting-Set with $| A | = | B |$ , we construct an instance of Maximum-Flow Minimum-Leakage Problem as follows (see Fig. 17):

Let $E^{'} = E_{J} \cup E_{t}$ be the set of edges of capacity 1. Note that for any subgraph H of the obtained Maximum-Flow Minimum-Leakage Problem instance we have:

Feige [3] showed that Hitting-Set cannot be approximated within $(1 - o (1)) ln | B |$ unless NP has quasi-polynomial time algorithms, while Raz and Safra [16] established a lower bound of $C \cdot ln | B |$ , where C is a constant, under the weaker assumption that $P \neq NP$ . This is so even when $| A | = | B |$ .

References

A.L.

Barabasi, Introduction and Keynote to a Networked Self, Routledge, Taylor and Francis Group, New York, NY, 2011, chapter Introduction, page 14.

D.J.

Cohen and

Tassa, Anonymization of centralized and distributed social networks by sequential clustering, Proceedings of the IEEE Transactions on Knowledge & Data Engineering 25(2) (2013), 311–324.

Feige, A threshold of

ln n

for approximating set cover, Journal of the ACM 45(4) (1998), 634–652. doi:10.1145/285055.285059.

D.R.

Fulkerson and

L.R.

Ford, Maximal flow through a network, Canadian Journal of Mathematics 8 (1956), 399–404. doi:10.4153/CJM-1956-045-5.

D.S.

Johnson and

M.R.

Garey, Computers and Intractability, W.H. Freeman & Co., New York, NY, 1990, chapter 3, page 64.

D.R.

Karger, Global min-cuts in rnc, and other ramifications of a simple min-cut algorithm, in: Proceedings of the Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, ACM, New York, 1993, pp. 21–30.

R.M.

Karp and

Edmonds, Theoretical improvements in algorithmic efficiency for network flow problems, Journal of the ACM 19(2) (1972), 248–264. doi:10.1145/321694.321699.

Kim

Sohn, and

S.M.

Choi, Cultural difference in motivations for using social network sites: A comparative study of American and Korean college students, Computers in Human Behavior 27 (2011), 365–372. doi:10.1016/j.chb.2010.08.015.

Krevl and

Leskovec, SNAP Datasets: Stanford large network dataset collection, 2014, http://snap.stanford.edu/data.

10.

Ligett and

Kleinberg, Information-sharing in social networks, Games and Economic Behavior 82 (2013), 702–716. doi:10.1016/j.geb.2013.10.002.

11.

Maheswaran and

Ranjbar, Using community structure to control information sharing in online social networks, Computer Communications 41 (2014), 11–21. doi:10.1016/j.comcom.2014.01.002.

12.

Papacharissi, A Networked Self, Routledge, Taylor and Francis Group, New York, NY, 2011, p. 337, chapter 14.

13.

Park and

Sandhu, Originator control in usage control, in: Proceedings of the IEEE 3rd International Workshop on Policies for Distributed Systems and Networks, Washington DC, 2002, pp. 60–66. doi:10.1109/POLICY.2002.1011294.

14.

Peng,

LaRose and

Junghyun, Social Networking Addictive, Compulsive, Problematic, or Just Another Media Habit? Routledge, Taylor and Francis Group, New York, NY, 2011, chapter Introduction, page 24.

15.

R.L.

Rivest,

T.H.

Cormen and

C.E.

Leiserson, Introduction to Algorithms, MIT Press, Cambridge, Massachusetts, 2009, p. 581, chapter 27, 1990.

16.

Safra and

Raz, A sub-constant error-probability low-degree test, and a sub-constant error-probability pcp characterization of np, in: Proc. 29TH Acm Symp. on Theory of Computing, El Paso, 1997, pp. 475–484.

17.

Stein and

D.R.

Karger, A new approach to the minimum cut problem, Journal of the ACM 43(4) (1996), 601–604. doi:10.1145/234533.234534.

18.

Torre,

Carmagnola and

Osborne, Escaping the big brother: An empirical study on factors influencing identification and information leakage on the web, Journal of Information Science 40(2) (2014), 180–197. doi:10.1177/0165551513509564.

19.

V.S.

Verykios,

Vatsalan and

Christen, A taxonomy of privacy-preserving record linkage techniques, Information Systems 38 (2013), 946–969. doi:10.1016/j.is.2012.11.005.

20.

Yolum and

Kokciyan, Priguard: A semantic approach to detect privacy violations in online social networks, IEEE Trans. Knowl. Data Eng. 28(10) (2016), 2724–2737. doi:10.1109/TKDE.2016.2583425.