Application of clustering cooperative differential privacy in spatial crowdsourcing task allocation

Abstract

A framework for spatial crowdsourcing task allocation based on centralized differential privacy is proposed for addressing the problem of worker’s location privacy leakage. Firstly, by combining two stages of differential privacy noise addition and clustering matching, a spatial crowdsourcing worker dataset with high differential privacy protection can be obtained; Secondly, the dynamic problem of spatial crowdsourcing task allocation is transformed into a static combinatorial optimization problem by dividing the spatiotemporal units and the “delay matching” strategy; Finally, the improved discrete glowworm swarm optimization algorithm is used to calculate the results of spatial crowdsourcing task allocation. It has been demonstrated that, compared to the direct differential privacy noise-adding assignment method and the discrete glowworm swarm optimization assignment method, the proposed method achieves better task assignment results, with the total travel distance reduced by 12.42% and 3.56%, respectively, and the task assignment success rate increased by 11.75% and 3.34%, respectively.

Keywords

Differential privacy k-means clustering space crowdsourcing task allocation the glowworm swarm optimization algorithm

1 Introduction

With the rapid development of the Internet and intelligent terminal devices, many Location Based Services (LBS) and applications have emerged [1]. These services and applications usually require users to continuously provide the dynamic location and related information, leading to a higher risk of user privacy exposure and posing serious security threats. For example, Strava publishes a heat map of user activities, and web users mine base locations, personal addresses, etc. [2]. The HMM models can identify the most anonymous users in the Gowalla dataset [3]. The literature [4] documents that 15 out of 30 LBS applications [5] leak the user’s location. How to ensure the normal operation efficiency of LBS applications under the premise of improving privacy protection is an important direction faced and researched by the IT and e-commerce industries. As one of the maincategories of LBS applications, spatial crowdsourcing refers to a newly developed type of crowdsourcing in which users publish customized tasks according to their needs. In the platform, the user’s needs are matched with the worker’s skills and location distance, and the worker is arranged to arrive at the specified location to complete the task, such as Meituan takeout and Didi taxi. At present, the task distribution of space crowdsourcing is mainly based on the platform server, users need to upload location and related information when posting tasks, and workers need to dynamically upload real-time location in order to receive orders, while the security performance of the space crowdsourcing platform server is generally not well guaranteed at present.

The most common technique for location privacy preservation is K-anonymity, which achieves indistinguishable K objects by generalization. Even though K-anonymity gives rise to a privacy-preserving method, it lacks rigorous mathematical proof. A worker’s security depends on their background knowledge, and an increase in K results in a longer invisible area, which is significantly prolonged by the server when the information is processed [6, 7]. According to the literature [8], a selection scheme based on maximum-minimum false locations makes it difficult for an attacker to distinguish between filtering false locations and providing privacy protection. Literature [9] achieves privacy protection by fuzzy periods and fuzzy space to encrypt the time and the region where the worker works and protects by increasing the number of regions and the range of periods where the worker submits after completing the task, which increases the spatiotemporal range when the worker completes the task. The location is too fuzzy to be used for assigning tasks. The above method is based on the study of the adversary without the user’s background information. However, these methods fail or decrease their effectiveness when the adversary has the user’s background information. Differential Privacy (DP), proposed by Dwork [10], has rigorous mathematical proof by adding random noise to the original query results, so an attacker cannot recognize the impact of adding or removing a record even if he has some background knowledge. Usually, the original data is first pre-processed with conversion, compression, and other pre-processing before adding noise to release. This method is beneficial to improve data availability. Common methods include clustering, hierarchical tree, Fourier transform, etc., where the accuracy and time complexity of clustering is excellent [11]. In addition to being one of the most widely used clustering algorithms, the k-means algorithm provides high-speed clustering and has a more straightforward implementation. Literature [12] proposed a differential privacy k-means method, but the availability of clustering results is not robust; literature [13] proposed another differential privacy k-means method based on initial centers, using k-means clustering to promote differential privacy, and proved the effectiveness with experiments; literature [14] proposed a differential privacy k-means method that eliminates anomalies and then selects initial centroids according to the data distribution and adds noise, but it has not been proved to be suitable for privacy protection of large-scale datasets; Literature [15] divides tasks by region. Based on the research methods in reference[15], literature [16] improved the traditional method by obtaining initial centroids after dividing the dataset into multiple subsets randomly under the condition of differential privacy protection, but the shortcoming is that the number of subsets of the dataset division is difficult to determine and affects the stability of the results. In addition, literature [17] introduces PPTA which take advantage of only lightweight cryptography (such as additive secret sharing and secure shuffle), is available but has low computational efficiency. Combining k-means with DP can improve the performance of noise addition in applications based on the above analysis.

Task allocation of space crowdsourcing (TSC, Task allocation of space crowdsourcing) is the problem of solving the optimal solution of different combinations of workers and users in the corresponding spatiotemporal range [18, 19]. In recent years, swarm intelligence optimization algorithms have been used as a common algorithm to compute optimal solutions or near-optimal solutions for various combinations of problems such as scheduling and TSP. Common algorithms include the particle swarm algorithm [20, 21], artificial fish swarm algorithm [22], and Glowworm Swarm Optimization Algorithm [23, 24], etc. The particle swarm algorithm has high search efficiency but easily falls into local optimality for higher dimensional data search. The artificial fish swarm algorithm is robust, and the initial parameters do not have much influence, but the convergence speed is slow. The Glowworm Swarm Optimization Algorithm (GSO) is fast and has good search capability in multi-dimensional space. Domestic and foreign scholars optimize the algorithm from initialization, variable step size, discretization, change of movement strategy, and other methods [24–28]. In this paper, we will combine the practical application of spatial crowdsourcing task assignment for effective coding and use the IDGSO algorithm to compute task assignment ground efficiently.

Only a few studies have examined how spatial crowdsourcing can enhance privacy protection through task assignments. Due to the uneven distribution of spatial crowdsourcing spatiotemporal data, it produces results on task assignment results. In particular, noise is added directly to the location information of workers and users before distribution, resulting in low usability of task assignment results. Three main aspects are addressed in this paper to address the existing research deficiencies.

When spatial crowdsourcing tasks are assigned, both workers and tasks appear in real-time at any location in space. It is difficult to achieve high allocation efficiency if the real-time allocation is performed. The “delayed matching” [29] strategy can better improve allocation efficiency and benefit. By matching tasks with online workers in a spatiotemporal unit, the dynamic problem of spatial crowdsourcing task allocation is transformed into a static combinatorial optimization problem to achieve a more effective allocation. An application framework of TDPKC (Two-stage differential privacy based on k-means clustering algorithm) for spatial crowdsourcing task assignment based on centralized differential privacy is designed on this basis. It is theoretically proved that the framework is consistent with differential privacy.

To fully characterize the spatiotemporal unit data distribution based on workers and tasks, a two-stage differential privacy plus noise approach is combined with cluster matching as part of TDPKC, an application framework for spatial crowdsourcing task assignment based on centralized differential privacy. This will ensure that task assignment results are readily available for the spatial crowdsourcing worker dataset while retaining differential privacy protection.

The IGSO algorithm is used to calculate the assignment results for spatial crowdsourcing tasks. The IGSOTSC is consistently effective regardless of the experimental validation on small or larger datasets. Compared with the method of direct differential privacy noise and the allocation method based on the discrete glowworm swarm optimization algorithm, the success rate of task allocation and the travel cost of task allocation is improved under the condition of satisfying the same budget noise of differential privacy.

2 Task assignment framework for clustered collaborative differential privacy TDPKC

Differential privacy mainly includes two types of differential privacy, localized differential privacy and centralized differential privacy. A trusted server is assumed to exist in the centralized differential privacy model. It is necessary to upload the dynamic trajectory of the worker in real-time since the worker’s location changes in real-time, and the spatial crowdsourcing server is constantly calling the worker’s location for the assignment of tasks. This poses a significant risk to the protection of the worker’s privacy. Users need to upload their real location when using the APP to post tasks to get the corresponding services, and they can protect the real-time location by turning off location sharing after getting the services. Since the spatial locations of users and workers are the same after the meeting, protecting the location privacy of workers means protecting users’ location privacy. Therefore, TDPKC is valuable for protecting the location privacy of both workers and users.

2.1 Differential privacy protection based on two phases

As shown in Fig. 1, the specific working steps in this framework are as follows.

After obtaining a collection of tasks, the space crowdsourcing server uploads a request to a trusted server using a space-time unit as the space-time unit.

The trusted server first extracts the worker dataset based on the corresponding spatiotemporal unit information, clusters it is using the k-means method, and generates different clustering results (K₁ set, center-of-mass set C₁, cluster N₁) by adding differential privacy as noise.

The spatial crowdsourcing server synchronously performs clustering analysis on the task set. It matches the clustering results (K₂, C₂) with the former, selects the matching parameters (K₁ best, Cbest), and uploads them to the trusted server.

Based on the clustering results corresponding to the parameters, the trusted server performs noise addition, generates snapshots by differential privacy, and transmits the results back to the spatial crowdsourcing server again.

Using the improved glowworm swarm optimization, the spatial crowdsourcing server matches task datasets and worker datasets of the same spatiotemporal unit to generate a task assignment scheme.

Fig. 1

Encryption process of task assignment.

The trusted server passes data to the spatial crowdsourcing server twice at each task assignment. The first time is the clustering result of the set of workers corresponding to the spatiotemporal unit, and the second time is the noise-added location data after the clustering match of the set of workers matching the set of tasks. According to the definition of sequence combination of differential privacy 3, to ensure compliance with differential privacy protection, differential privacy protection shall be performed in both step 2 and step 4, and the privacy budget is the sum of the two times.

2.2 Matching of clustering results

When spatial crowdsourcing tasks are allocated, the k-means mean clustering is performed on location data where k clusters are present. Among them, when the cluster is k, the set of centroids obtained by the worker location data clustering is C1, and the number of workers in the corresponding cluster is N1. When the cluster of spatial crowdsourced task location data is k2, the group of centroids obtained by clustering is C2, and the number of tasks of the corresponding cluster is N2. TC represents the matching of clustering results, such as Equation (1), which considers the distance between centroids and the weight of the elements in the corresponding cluster. A smaller TC value indicates a higher degree of matching of clustering results and vice versa. $TC = \sum_{i = 1}^{k} dis (c_{1 i}, c_{2 i}) \frac{| n_{1 i} - n_{2 i} | + 1}{N}$ (1) S.t. ${\begin{matrix} N_{1} + N_{2} = \sum_{i = 1}^{k} (n_{1 i} + n_{2 i}) \\ dis (c_{1 i}, c_{2 i}) = \sqrt{{(x_{c_{1 i}} - x_{c_{2 i}})}^{2} + {(y_{c_{1 i}} - y_{c_{2 i}})}^{2}} \\ c_{1 i} = (x_{c_{1 i}}, y_{c_{1 i}}); c_{2 i} = (x_{c_{2 i}}, y_{c_{2 i}}) \\ C_{1} = {c_{11}, c_{12}, . . ., c_{1 k}}; N_{1} = {n_{11}, n_{12}, . . ., n_{1 k}} \\ C_{2} = {c_{21}, c_{22}, . . ., c_{2 k}}; N_{2} = {n_{21}, n_{22}, . . ., n_{2 k}} \end{matrix}$ (2)

2.3 Theoretical proof

(1) The basic properties of differential privacy

Differential privacy is specified as defined 1 [30], and the main properties include global sensitivity [31], sequence combinatoriality [31], parallel combinatoriality [32], Laplace noise mechanism [33], exponential noise mechanism [34], etc. In this paper, some of the relevant properties are described as follows.

Definition 1. ɛ-differential privacy (ɛ - DP). Given an adjacent data set T and T′ an algorithm A. An algorithm A satisfies differential privacy if any output O of algorithm A at T and T′ satisfies Equation (3). $Pr (A (T) = O) ⩽ Pr (A (T^{'}) = O) e^{ɛ_{t}}$ (3) where the parameter ɛ denotes the differential privacy budget, and the larger the ɛ, the less noise and the less privacy protection, and vice versa.

Definition 2. Global sensitivity. For the functions f : T → R^d, then, Δf is said to be the global sensitivity $Δ f = max_{D, D^{'}} {∥ f (T) - f (T^{'}) ∥}_{1}$ y of the function f. The global sensitivity is independent of the data set T and is related to the query function f

Definition 3. Sequence combinatoriality. The sequence combination of {A₁, A₂, . . . , A_n} on T satisfies $ɛ = \sum_{i = 1}^{n} ɛ_{i}$ if a randomized algorithm {A₁, A₂, . . . , A_n} is made separately on the data set T, and any A_i satisfies ɛ_i - DP.

The Laplace noise mechanism is one of the noise mechanisms for differential privacy, adding random noise to the returned result that conforms to the Laplace function distribution. The Laplace function is Equation (4). $f (x | μ, b) = \frac{1}{2 b} e^{\frac{- | x - μ |}{b}}$ (4)

In differential privacy, usually let μ = 0, b = Δf/ɛ, when the function Laplace function is noted as Equation (5). $Lap (Δ f / ɛ) = \frac{ɛ}{2 Δ f} e^{\frac{- ɛ | x |}{Δ f}}$ (5)

Theorem 1. For any function f : T → R, if the output of randomized algorithm A satisfies Equation (6), then algorithm A is said to satisfy ɛ - DP. $A (T) = f (T) + Lap (Δ f / ɛ)$ (6)

Lap (Δf/ɛ) is the added Laplace noise, which is proportional to the global sensitivity and inversely proportional to ɛ

(2) TDPKC complies with differential privacy

Theorem 2. The TDPKC framework is consistent with ɛ-differential privacy during the spatial crowdsourcing worker dataset LW-noising to obtain LW’.

Let the spatial crowdsourcing dataset LW be $f (L W_{1}) = (l_{1}^{1}, . . ., l_{1}^{k})$ , and after differential privacy noise addition be $f (L {W_{1}}^{'}) = (l_{1}^{1^{'}}, . . ., l_{1}^{k^{'}}) = (l_{1}^{1} + Δ l_{1}^{1}, . . ., l_{1}^{k} + Δ l_{1}^{k})$ , then $Δ f = max_{L W_{1}, L {W_{1}}^{'}} (\sum_{i = 1}^{k} (| l_{1}^{i} - l_{1}^{i^{'}} |)) = max_{L W_{1}, L {W_{1}}^{'}} (\sum_{i = 1}^{k} (| Δ l_{1}^{i} |))$ . Let the output sequence be S = (y₁, . . . , y_n), the query algorithm be A, and the privacy budget be ɛ ₁ . The privacy budget of LC’ obtained by differential privacy noise addition on the clustered prime dataset LC in step 2 of Fig. 1 is ɛ₂, and ɛ₁ + ɛ₂ = ɛ.

Proof. $\begin{matrix} \frac{Pr [A (L W_{1}) \in S]}{Pr [A (L {W_{1}}^{'}) \in S]} = \frac{\prod_{i = 1}^{k} \frac{ɛ_{1}}{2 Δ f} e^{\frac{- ɛ_{1}}{Δ f} | y_{i} |}}{\prod_{i = 1}^{k} \frac{ɛ_{1}}{2 Δ f} e^{\frac{- ɛ_{1}}{Δ f} | Δ l_{1}^{i} - y_{i} |}} \\ = \prod_{i = 1}^{k} e^{\frac{- ɛ_{1}}{2 Δ f} (| y_{i} | - | Δ l_{1}^{i} - y_{i} |)} = e^{\frac{ɛ_{1}}{Δ f} \sum_{i = 1}^{k} (| Δ l_{1}^{i} - y_{i} | - | y_{i} |)} \end{matrix}$ _Because . $| Δ l_{1}^{i} - y_{i} | - | y_{i} | ⩽ | y_{i} - Δ l_{1}^{i} - y_{i} | = | Δ l_{1}^{i} |$ Therefore. $\begin{matrix} \sum_{i = 1}^{d} (| Δ l_{1}^{i} - y_{i} | - | y_{i} |) \\ ⩽ \sum_{i = 1}^{n} | Δ l_{1}^{i} | ⩽ max_{L W_{1}, L {W_{1}}^{'}} (\sum_{i = 1}^{n} | Δ l_{1}^{i} |) = Δ f \end{matrix}$ So there was. $\frac{Pr [A (L W_{1}) \in S]}{Pr [A (L {W_{1}}^{'}) \in S]} ⩽ e^{ɛ_{1}}$ The same reason can be proved $\frac{Pr [A (L C_{1}) \in S]}{Pr [A (L {C_{1}}^{'}) \in S]} ⩽ e^{ɛ_{2}}$ Because. $ɛ_{1} + ɛ_{2} = ɛ,$

According to the sequence combinatoriality of DP,

Therefore. $\frac{Pr [A (LW) \in S]}{Pr [A ({LW}^{'}) \in S]} ⩽ e^{ɛ_{1} + ɛ_{2}} ⩽ e^{ɛ}$

Testimonial Bi.

3 Task allocation model IGSOTSC

3.1 Model definition

The study scenario assumes that the model meets the following preconditions.

(1) workers can receive task allocation only if their online status and location are uploaded to the central server in real-time; (2) workers cannot receive new tasks while they are in the process of completing them; (3) task postings are automatically canceled when they reach the deadline, and the user performs a re-posting as a new task.

The model-related concepts are defined as follows.

Definition 4. Spatiotemporal unit: the study area is divided equally into N squares of area R. Each square within a certain time stamp T forms a spatiotemporal unit Q. K = R × T.

Definition 5. Space Crowdsourcing Tasks [34]: Space Crowdsourcing Tasks u =< l_u, t_u, s_u, p_u >, where l_u is the initial location of task u, t_u is the timestamp of the user posting the task, s_u is whether task u is currently posting and p_u is the maximum acceptable wait time for the user.

Definition 6. Spatial Crowdsourcing Worker [34]: Spatial Crowdsourcing Worker w =< l_w, t_w, s_w >, where l_w is the current location of the worker w, t_w is the timestamp of the different locations of the worker and s_w is the current status of the worker (online or not, completing a task or not, etc.).

3.2 Evaluation indicators

Space crowdsourcing task assignments are evaluated based on total travel distance, total waiting time, and success rate [34]. The total travel distance and waiting time are equivalent when the effect of the worker’s speed is not considered. In this paper, total travel distance and the assignment success rate are chosen as evaluation metrics without considering the factor of worker’s speed. The total travel distance measures the total efficiency of the assignment. The smaller the total travel distance, the greater the total allocation efficiency, and the larger the total travel cost, the lower the allocation efficiency. The allocation success rate is usually related to the user waiting time and the expected return of the task. If the posted task reaches the deadline and still no corresponding worker is scheduled, it will lead to order failure.

Definition 7. Total travel distance: the sum of travel distances of all workers and corresponding tasks in one spatial crowdsourcing task assignment. l_UW denotes the total travel distance, with a total of n combinations formed by workers and users in the task assignment. $L_{UW} = \sum_{i = 1}^{n} dis (u_{i}, w_{i})$ (7)

Definition 8. Assignment success rate: denoted by SUC_UW, refers to the ratio of the number M_SUC to the total number M of orders successfully accepted by users in a space crowdsourcing assignment when the space crowdsourcing server matches workers and tasks. $SU C_{UW} = \frac{M_{suc}}{M}$ (8)

3.3 Glowworm swarm optimization and improvement

(1) Glowworm swarm optimization GSO

As the fluorescein of the glowworm corresponds to the target function, the brighter the glowworm is, the stronger the attraction, and the attraction capacity decreases with distance. GSO has a total of stages of fluorescein update, selection of moving direction, position update, and decision domain update, and the specific steps are as follows. $l_{i} (t) = (1 - ρ) l_{i} (t - 1) + γ J (x_{i} (t))$ (9) $N_{i} (t) = {j : ∥ x_{j} (t) - x_{i} (t) ∥ < r_{d}^{i} (t); l_{i} (t) < l_{j} (t)}$ (10) $ρ_{ij} (t) = \frac{l_{j} (t) - l_{i} (t)}{\sum_{k \in N_{i} (t)} l_{k} (t) - l_{i} (t)}$ (11) $x_{i} (t + 1) = x_{i} (t) + s (\frac{x_{j} (t) - x_{i} (t)}{∥ x_{j} (t) - x_{i} (t) ∥})$ (12)

$\begin{matrix} r_{d}^{i} (t + 1) \\ = min {r_{s}, max {0, r_{d}^{i} (t) + β (n_{t} - | N_{i} (t) |)}} \end{matrix}$ (13) where l_i (t) is the fluorescein value of the glowworm, ρ is the fluorescein volatility coefficient, γ, J (x_i (t)) is the fluorescein enhancement coefficient, and the objective function value at the t^th iteration, respectively; N_i (t) is the set of individual fireflies brighter than the glowworm x_i at the t^th iteration in the decision domain, ρ_ij (t) is the probability of the current glowworm x_i moving to the brighter glowworm x_j; s is the step size, β is the coefficient of the perception radius, n_t is the threshold of the number of fireflies in the neighborhood, and r_s is the perception radius of the neighborhood threshold.

(2) Discrete glowworm swarm optimization DGSO

Based on the coding rules, an initial solution is generated at random, the subscripts of the dimensions indicate the worker numbers, the numbers on the dimensions indicate the user numbers, and the number b on the a th position represents the combination of user b and worker a. Glowworm i and glowworm j use the Hemming distance, and the sum of the distances on each dimension of each glowworm is the actual distance between the two fireflies, as detailed in Equations (14) and (15), where g ∈ [1, m]. $distance (i, j)_{g} = {\begin{matrix} 0, i_{g} = j_{g} \\ 1, i_{g} \neq j_{g} \end{matrix}$ (14) $distance (i, j) = \sum_{g = 1}^{m} | distance {(i, j)}_{g} |$ (15)

(3) Improved glowworm swarm optimization IGSO

In the iteration of the discrete firefly algorithm DGSO, five new mobile strategies (5NMS, five new mobile strategies) are introduced with mutation factor p to improve the optimization efficiency and obtain IGSO. These five moving strategies can ensure that the solution after moving is still feasible, including inner inversion [31] (0 < rand ⩽ 0.2), outer inversion (0.2 < rand ⩽ 0.4), left inversion (0.4 < rand ⩽ 0.6), right inversion (0.6 < rand ⩽ 0.8), and queue jumping (0.8 < rand ⩽ 1). See Table 1 for details.

Table 1

Glowworm’s mobile strategy

Strategy	Dimension values of glowworm i

Glowworm I	5	3	2	4	6	7	8	1	9
Inner	5	3	2	8	7	6	4	1	9
Outer	2	3	5	4	6	7	8	9	1
Lef	2	3	5	4	6	7	8	1	9
Right	5	3	2	4	6	7	8	9	1
Jumpig	5	3	2	6	7	1	9	4	8

3.4 Task assignment and IGSO algorithm solving

(1) Specific steps

If m workers and n tasks are in a spatiotemporal cell Q (assume m ⩽ n), each assignment forms a combination of $C_{n}^{m} * m!$ numbers, which is an NP-hard problem [25]. This paper uses the improved GSO algorithm IGSO to solve [30] space complexity $O (C_{n}^{m} * m)$ . The specific steps are shown in Fig. 2.

Fig. 2

Specific steps of task assignment.

Step 1: Pre-process spatial crowdsourcing data to form the initial dataset for spatial crowdsourcing task assignment by cluster matching, differential privacy noise addition, and fusion of datasets.

Step 2: Initialization of the IGSO algorithm’s parameters, the codes’ initialization, and corresponding task allocation.

Step 3: Constantly updating the glowworm fluorescein by roulette or inverse, reciprocal, and other moving strategies under adaptive factor adjustment and calculating the corresponding objective function.

Step 4: Check and address infeasible solutions.

Step 5: Compare bulletin boards and replace them if they are better than bulletin boards.

Step 6: Output the task assignment scheme and decode it when the number of iterations is reached.

(2) Objective function setting

The objective function is shown in Equation (16), respectively. The total travel distance L_UW is as small as possible, and the assignment success rate SUC_UW is as large as possible, so the objective function F solves for the minimum value. Where in the objective function, coefficients c1 and c2 separate the two parts of data into uniform measures. I and J_I denote the set of tasks and the corresponding set of workers assigned to the tasks by the spatial crowdsourcing server. $F (I, J_{I}) = \frac{c_{1} * L_{UW}}{c_{2} * SU C_{UW}}$ (16)

3.5 Algorithm pseudo-code description

Algorithm 1 TDPKC-based noise addition for worker datasets

Input parameters k for k-means clustering, worker location dataset LW, privacy budget ɛ₁, ɛ₂

Output Publish interference traces DT′

1) for j = 1 to k

2) $c_{1}^{'} (j) = c_{1} (j) + Lap (Δ f / ɛ_{2})$

3) end

4) for j = 1 to m

6) LW′ (j) = LW (j) + Lap (Δf/ɛ₁)

7) end

Algorithm 2 IGSO-based spatial crowdsourcing task assignment algorithm IGSOTSC

Enter the noised worker location dataset T′, the task location dataset, the initial encoding se

Output Issue task assignment programs DT′

1) while t ⩽ t_max do

2) for i = 1 to m

3) if rand <p

4) glowworm (sei) → roulette move

5) else if p <rand < =1

6) glowworm (sei) → 5NMS

7) end if

8) if f (sei) _max

9) se_max = se_i; Update bulletin board

10) end if

// Decode and generate releases according to se_max DT′

11) end for

4 Experiment and analysis

4.1 Experimental environment and data description

The experiments were done in Matlab R2016a with PC parameters: system Windows 7, RAM (8 G), CPU Intel(R) Core(TM) i5-4460 (3.2 GHz). Parameters selection and justification of the IGSO and DGSO algorithms are specified in the literature [26].

The location data were downloaded from the Dataju website (http://dataju.cn) at [35], and the 3000 geolocation coordinates required for the experiment were obtained by removing duplicate location information. Scaled mapping to the interval [0,200], other data were generated by experiment, synthesis, and the specific relevant data and the range of values are shown in Table 1. e denotes the privacy budget, and k represents the number of clustering clusters. The small-scale dataset SD contains 6 scale sub-datasets sd1 sd6, and the larger-scale dataset BD contains a total of 6 scale sub-datasets bd1 bd6. Each dataset contains the spatial location information of workers and users at corresponding times l_u, l _w.

A total of four methods, OR-DGSOTSC, DP-DGSOTSC, TDPKC-DGSOTSC, and TDPKC-IGSOTSC, are experimentally compared for task assignment performance. A spatial crowdsourcing task assignment using DGSO on the original dataset, differential privacy direct noise addition dataset, and differential privacy two-stage noise addition dataset utilizing k-means clustering is depicted by OR-DGSOTSC, DP-DGSOTSC, and TDPKC-DGSOTSC, respectively. In order to assign spatial crowdsourcing tasks, TDPKC-IGSOTSC utilizes IGSO to cluster differential privacy two-stage noise-added datasets based on k means clustering.

4.2 Experiments on small-scale data sets

(1) Total travel distance analysis

Based on the results of the 20 experiments on SD, Fig. 3 shows the average results of the total travel distance. On the six sub-datasets of SD, the total travel distance obtained from TDPKC-DGSOTSC is always smaller than that of DP-DGSOTSC, decreasing by 4.62%, 9.87%, 6.14%, 4.20%, 6.26%, and 4.08%, respectively, indicating that DGSOTSC in TDPKC mode can effectively reduce the total travel distance. On each sub-dataset, the total travel distance calculated by all three methods keeps increasing as the differential privacy budget decreases and the noise increases. Despite an increase in data size for sd1-sd6, the rate of decline in total travel distance does not seem to be significantly correlated with the dataset size because the spatial distribution of the worker and user datasets has randomness. The spatial distribution characteristics affect the worker’s travel distance, which can affect the effect of cluster matching, so the performance of TDPKC-DGSOTSC on SD different-size datasets varies. Still, the average decrease is 5.86% compared with DP-DGSOTSC, so TDPKC-DGSOTSC can get a better task assignment scheme.

Fig. 3

Total travel distance for different methods.

In TDPKC mode, the total travel distance obtained by IGSOTSC is all smaller than that of DGSOTSC, decreasing by 3.36%, 3.35%, 3.61%, 3.84%, 4.49%, and 2.72%, respectively, with an average decrease of 3.56%, indicating that IGSO has a better ability to search for optimal solutions than DGSO, improving the performance of task assignment. TDPKC-IGSOTSC decreases by 12.42% on average compared to DP-DGSOTSC. Combined with the above analysis, TDPKC-IGSOTSC can obtain a better task assignment solution under compliance with differential privacy.

(2) Analysis of distribution success rate

The average results of task assignment success rates obtained from 20 experiments on SD are shown in Fig. 4. On the six sub-datasets of SD, the success rate of TDPKC-DGSOTSC is consistently greater than that of DP-DGSOTSC, increasing by 6.57%, 15.95%, 10.06%, 4.62%, 6.67%, and 6.59%, respectively, indicating that DGSOTSC in TDPKC mode effectively improves the allocation success rate. The success rate calculated by the three methods decreases as the differential privacy budget decreases and the noise increases on each sub-dataset. sd1 sd6 datasets are increasing in size, but the task assignment success rate does not show a significant linear relationship with larger data sizes because the datasets’ spatial distribution is random. Compared with DP-DGSOTSC, the average improvement is 8.41% so TDPKC-DGSOTSC can obtain a better task assignment scheme. With a TDPKC mode assignment success rate of 1.70 percent, 5.27%, 4.16 percent, 2.91%, 3.21%, and 2.82 percent, respectively, IGSOTSC obtains a higher assignment success rate than DGSOTSC. The average improvement is 3.34%, which indicates that IGSO is better at solving search problems than DGSO and performs better on task assignments. TDPKC-IGSOTSC performs better than DP-DGSOTSC on average by 11.75%. Combined with the above analysis, TDPKC-IGSOTSC can obtain a better task assignment scheme under compliance with differential privacy.

Fig. 4

Assignment success rate of different methods.

(3) Parameter analysis

Table 2

Description of experimental data

Parameter	Value range	Data sources
e	(0.1, 0.08, 0.06, 0.04, 0.02, 0.01)
	experiment
k	(1,2,3,4,5)	experiment
SD	(60, 80, 100, 120, 140, 160)	synthesis
BD	(500, 1000, 1500, 2000, 2500,3000)	synthesis
l_u	[0,200]	mapping
l_w	[0,200]	mapping

Privacy budget e allocation and K are important parameters that affect the TDPKC-IGSOTSC method. The sd3 dataset was selected for 20 experiments to take the average results detailed in Tables 3 and 4. Table 3 shows the average results of the experiments of the two methods when the total privacy budget ɛ = 0.1, and the clustering clusters k = 3. In the TDPKC mode, the total travel distance of IGSOTSC is always more extensive than that of DGSOTSC, indicating that IGSO improves the performance of searching the solution space. As the privacy budget allocated in the first stage decreases, the matching rate of 20 clusters decreases from 100% to 35%, and the total travel distance first decreases and then increases. Although the clustering matching rate decreases, the elevated privacy budget allocated in the second stage improves the accuracy of each spatial crowdsourcing task assignment. The experiments yielded the best solution for task assignment when the first and second-stage privacy budgets ɛ₁ = ɛ₂ = 0.05.

Table 3

Total travel distance corresponding to different privacy budgets

e\d	TDPKC-DGSOTSC (10⁴)	TDPKC-IGSOTSC (10⁴)	Clustering Matching rate	Consume Time (s)
0.09,0.01	1.538	1.501	100%	14.1
0.07,0.03	1.431	1.392	99.5%	13.2
0.05,0.05	1.151	1.112	90%	12.5
0.03,0.07	1.221	1.118	65%	11.9
0.01,0.09	1.315	1.277	35%	11.4

Table 4

Total travel distance corresponding to different K

K\d	DP-DGSO TSC (10⁴)	TDPKC-DGSOTSC (10⁴)	TDPKC-IGSOTSC (10⁴)	Consume Time (s)
1	1.202	1.459	1.415	5.3
2	1.202	1.231	1.211	8.4
3	1.202	1.151	1.112	12.5
4	1.202	1.191	1.153	16.3
5	1.202	1.265	1.242	19.1

Table 4 shows the average results of the 20 experiments corresponding to different values of the clustering clusters K when the total privacy budget ɛ = 0.1 and ɛ₁ = ɛ₂ = 0.05 is taken. As K increases, the total travel distance derived from both methods first decreases and then increases. When k = 3 or k = 4 is used to calculate the task assignment with DGSOTSC, the total travel distance of TDPKC is smaller than DP, and when K takes 1, 2, and 5, the travel distance of TDPKC is larger than DP, proving that TDPKC-DGSOTSC is not always valid. This is related to the parameter K.

In the case that K is too small, clustering and dimensionality reduction cannot be fully exploited, and when K is too large, workers and tasks are decomposed into too many clusters, and the number of wrapped workers and tasks within each cluster is too small. This affects the effectiveness of the first stage of cluster matching as well as the second stage of task assignment for spatial crowdsourcing. At any time K increases, the computation consumes more and more time. In summary, the TDPKC-DGSOTSC calculation results are optimal when the parameter k is taken as 3. The total travel distance of IGSOTSC is always larger than that of DGSOTSC when k takes different values in the TDPKC mode, indicating that IGSO improves the performance of searching the solution space.

4.3 Experiments on larger data sets

When conducting experiments on larger datasets, each dataset is still decomposed into several sub-datasets according to the specifics of spatiotemporal units, and the results are obtained based on cumulative calculations.

Table 5 defines the relevant experiments as the average results of 20 experiments. The experimental results on bd1 bd6 datasets show that the total travel distance calculated by TDPKC-DGSOTSC and TDPKC-IGSOTSC is always between OR-DGSOTSC and DP-DGSOTSC, indicating that the TDPKC-DGSOTSC and TDPKC-IGSOTSC methods are still effective on larger data sets. Additionally, the TDPKC-IGSOTSC results are always better than the TDPKC-DGSOTSC results, which further verifies that IGSO has better performance in finding the optimal solution than DGSO.

Table 5
Total travel distance for different methods

Date set OR-DGSO TSC (10⁵) DP-DGSO TSC (10⁵) TDPKC-DGSO TSC (10⁵) TDPKC-IGSO TSC (10⁵)

bd1 0.378 0.622 0.596 0.572

bd2 0.736 1.413 1.171 1.153

bd3 1.375 1.823 1.747 1.719

bd4 1.756 2.458 2.333 2.294

bd5 2.191 3.335 2.908 2.857

bd6 2.515 4.008 3.494 3.436

Average 1.492 2.276 2.041 2.012

Date set	OR-DGSO TSC (10⁵)	DP-DGSO TSC (10⁵)	TDPKC-DGSO TSC (10⁵)	TDPKC-IGSO TSC (10⁵)
bd1	0.378	0.622	0.596	0.572
bd2	0.736	1.413	1.171	1.153
bd3	1.375	1.823	1.747	1.719
bd4	1.756	2.458	2.333	2.294
bd5	2.191	3.335	2.908	2.857
bd6	2.515	4.008	3.494	3.436
Average	1.492	2.276	2.041	2.012

When conducting task assignment experiments for larger datasets, each dataset is decomposed into several sub-datasets according to the specifics of the spatiotemporal unit, and the results are obtained based on the average value. The relevant experiments were taken as the average results of 20 experiments, as detailed in Table 6.

Table 6

Success rate of different methods

Date set	OR-DGSO TSC	DP-DGSO TSC	TDPKC-DGSO TSC	TDPKC-IGSO TSC
bd1	90.90%	85.40%	88.80%	89.90%
bd2	92.10%	87.20%	89.90%	90.70%
bd3	92.60%	88.50%	90.10%	91.20%
bd4	91.40%	87.10%	92.40%	92.90%
bd5	90.50%	86.70%	87.90%	89.30%
bd6	90.10%	84.80%	88.10%	89.60%
Average	91.27%	86.62%	89.53%	90.60%

The experimental results on bd1 bd6 datasets show that the task assignment success rates calculated by TDPKC-DGSOTSC and TDPKC-IGSOTSC are always between OR-DGSOTSC and DP-DGSOTSC, which indicates that the TDPKC-DGSOTSC and TDPKC-IGSOTSC methods on larger datasets are still valid. The results of TDPKC-IGSOTSC always outperform the results of TDPKC-DGSOTSC, which further verifies that IGSO has better performance in finding the optimal solution compared with DGSO.

5 Conclusion

In this paper, using centralized differential privacy protection as a basis for spatial crowdsourcing task allocation, we develop a framework for addressing the issue of worker location privacy leakage in spatial crowdsourcing task allocation. The framework is theoretically consistent with differential privacy and is effective. By dividing spatiotemporal units and applying a “delayed matching” strategy, the dynamic problem of spatial crowdsourcing task assignment is transformed into a static combinatorial optimization problem. We combine two-stage differential privacy noise addition with cluster matching to obtain a dataset of spatial crowdsourcing workers with high usability protected by differential privacy. Using an improved glowworm swarm optimization algorithm, we compute the spatial crowdsourcing task assignment results after noise addition with different methods. Experiments on several datasets show that the results of task assignment obtained by the method in this paper are better compared with the method of direct differential privacy noise addition and the method based on discrete glowworm swarm optimization assignment.This method is based on two-stage centralized differential privacy denoising, which will reduce efficiency when the number of tasks is large. In the next step, we will continue to improve and enhance the performance of the glowworm swarm optimization algorithm, and study task allocation based on localized differential privacy and personalized privacy budget.

Footnotes

Acknowledgments

This work was supported by the Anhui Provincial Natural Science Foundation under Grant (No. 1908085QG298), the National Nature Science Foundation of China under Grant (No. 71521001), and the Open Research Fund Program of Key Laboratory of Process Optimization and Intelligent Decision-making (Hefei University of Technology), Ministry of Education.

References

Fen

G.D.

, Zhang

and Ye

Y.T.

, Research on differentially privatetrajectory data publishing, Journal of Electronics & Information Technology 42 (2020), 74–88.

Pwn

, Fitness APP divulges military secrets. https://zhuanlan.zhihu.com/p/33405626, (2018), 1.

Chen

Z.Y.

, Fu

Y.Y.

, Zhang

et al. The de anonymization method based on user spatio-temporal mobility trace., In: The 19th International Conference on Information and Communications Security (2017), 459–471.

Zhao

, Li

, Zeng

et al. ILLIA: enabling k-anonymity-based privacy preserving against location injection attacks in continuous LBS queries, IEEE Internet of Things Journal 5 (2018), 1033–1042.

Enck

, Gilbert

, Han

et al. TaintDroid: an information-flow tracking system for real-time privacy monitoring on smartphones, ACM Transactions on Computer Systems (TOCS) 32 (2014), 1–2.

and Cai

, Location anonymity in continuous location based services., In: Acm International Symposium on Advances in Geographic Information Systems (2007), 2.

, Exploring historical location data for anonymity preservation in location-based services. In: Infocom the Conference on Computer Communications IEEE (2008), 547–555.

Wang

, Wang

C.R.

, Ma

J.F.

et al. Dummy location selection algorithm based on location semantics and query probability, Journal of Communications 41 (2020), 53–61.

Chu

and Zhong

Q.Y.

, Crowdsourcing quality control model protecting location privacy of workers, Systems Engineering-Theory & Practice 36 (2016), 2047–2055.

10.

Dword

, Nissim

, et al. Calibrating noise to sensitivity in private data analysis, In: Proceedings of the 3rd theory of cryptography conference (2006), 265–284.

11.

Wang

, Ge

L.N.

, Wang

S.Q.

et al. Improvement of differential privacy protection algorithm based on OPTICS clustering, Journal of Computer Applications 38 (2018), 73–78.

12.

Blum

, Dwork

, Mcsherry

et al. Practical privacy:the SuLQ framework, In: Proceedings of the 24th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (2005), 128–138.

13.

, Hao

, Wen

et al. Research on differential privacy preserving k-means clustering, Computer Science 40 (2013), 287–290.

14.

, Luo

, Chen

et al. Outlier-eliminated k-means clustering algorithm based on differential privacy preservation, Applied Intelligence 45 (2016), 1179–1191.

15.

P.C.

, Xin

, Li

Z.W.

et al. pMATE: A Privacy-Preserving Map Retrieval Task Assignment Scheme in Spatial Crowdsourcing, Security and Communication Networks (2023), 1–13.

16.

Ren

, Xiong

, Yao

et al. DPLK-means: a novel differential privacy k-means mechanism, In: IEEE Second International Conference on Data Science in Cyberspace (2017), 133–139.

17.

Zhou

M.L.

, Zheng

Y.F.

, Wang

S.L.

et al. PPTA: A location privacy-preserving and flexible task assignment service for spatial crowdsourcing, Computer Networks 224 (2023), 2–16.

18.

Xie

, W

Y.H.

, Li

K.L.

et al. Satisfaction-aware Task Assignment in Spatial Crowdsourcing, Information Sciences 622 (2023), 512–535.

19.

B.W.

, Han

and Zhang

E.P.

, On the task assignment with group fairness for spatial crowdsourcing, Information Processing and Management 60 (2023), 2–22.

20.

Wang

, Yu

Z.W.

, Han

et al. Multi-Objective Optimization Based Allocation of Heterogeneous Spatial Crowdsourcing Tasks, IEEE Transactions on Mobile Computing 17 (2018), 1637–1650.

21.

Mohammad

R.B.

, A theoretical guideline for designing an effective adaptive particle swarm, IEEE Transactions on Evolutionary Computation 3 (2019), 1–14.

22.

X.L.

, Shao

Z.J.

and Qian

J.X.

, An optimization model based on animal autonomy: fish swarm algorithm, System Engineering Theory and Practice 11 (2002), 33–38.

23.

Yang

X.S.

, Nature-Inspired Metaheuristic Algorithms, Luniver Press (2008), 79–90.

24.

Krishnanand

K.N.

and Ghose

, Glowworm swarm based optimization algorithm for multimodal functions with collective robotics applications, Multiagent Grid Systems 2 (2006), 209–222.

25.

Ran

J.M.

, Zhi

Z.W.

, Peng

et al. Task allocation strategy considering service quality of spatial crowdsourcing workers and its glowworm swarm optimization algorithm solution, Journal of Computer Applications 41 (2021), 794–802.

26.

Peng

, Ni

Z.W.

, Wu

Z.J.

et al. Research on incentive strategy based on service quality in spatial crowd sourcing task allocation, Journal of Intelligent & Fuzzy Systems 43 (2022), 5551–5566.

27.

Z.W.

, Xia

P.F.

, Zhu

X.H.

et al. A novel ensemble pruning approach based on information exchange glowworm swarm optimization and complementarity measure, Journal of Intelligent & Fuzzy Systems 39 (2020), 8299–8313.

28.

Salkuti, R.L. Surender and Yong

, Multi-objective glowworm swarm optimization for solving the optimal scheduling of thermal-windpower system, Journal of Intelligent & Fuzzy Systems 35 (2018), 5045–5054.

29.

B.Y.

, Cheng

Y.R.

, Wang

G.R.

et al. 3D-online stable matching problem for new spatial crowdsourcing platforms, Journal of Software 12 (2020), 3837–3849.

30.

Dword

, Mc

S.F.

, Nissim

et al. Calibrating noise to sensitivity in private data analysis, In: Proceedings of the 3rd theory of cryptography conference 2006, 265–284.

31.

Mcsherry

F.D.

and Talwar

, Mechanism design via differential privacy, In: The 48th Annual IEEE Symposium on Foundations of Computer Science 2007, 94–103.

32.

Mcsherry

F.D.

, Privacy integrated queries: An extensible platform for privacy-preserving data analysis., In: The 2009 ACM SIGMOD International Conference on Management of Data 2009, 19–30.

33.

Dwork

, A firm foundation for private data analysis, Communications of the ACM 54 (2011), 86–95.

34.

Tong

Y.X.

, Zhou

Z.M.

, Zeng

Y.X.

et al. Spatial crowdsourcing: a survey, The VLDB Journal 29 (2020), 217–250.

35.

S.L.

, Uber>New York City travel data. http://dataju.cn/Dataju/web/datasetInstanceDetail/210, (2017), 7.