Practical multi-party private set intersection cardinality and intersection-sum protocols under arbitrary collusion *

Abstract

Private set intersection cardinality (PSI-CA) and private intersection-sum with cardinality (PSI-CA-sum) are two primitives that enable data owners to learn the intersection cardinality of their data sets, with the difference that PSI-CA-sum additionally outputs the sum of the associated integer values of all the data that belongs to the intersection (i.e., intersection-sum). However, to the best of our knowledge, all existing multi-party PSI-CA (MPSI-CA) protocols are either limited by high computational cost or face security challenges under arbitrary collusion. As for multi-party PSI-CA-sum (MPSI-CA-sum), there is even no formalization for this notion at present, not to mention secure constructions for it.

In this paper, we first present an efficient MPSI-CA protocol with two non-colluding parties. This protocol significantly decreases the number of parties involved in expensive interactive procedures, leading to a significant enhancement in runtime efficiency. Our numeric results demonstrate that the running time of this protocol is merely one-quarter of the time required by our proposed MPSI-CA protocol that is secure against arbitrary collusion. Therefore, in scenarios where performance is a priority, this protocol stands out as an excellent choice.

Second, we successfully construct the first MPSI-CA protocol that achieves simultaneous practicality and security against arbitrary collusion. Additionally, we also conduct implementation to verify its practicality (while the previous results under arbitrary collusion only present theoretical analysis of performance, lacking real implementation). Numeric results show that by shifting the costly operations to an offline phase, the online computation can be completed in just 12.805 seconds, even in the dishonest majority setting, where 15 parties each hold a set of size $2^{16}$ .

Third, we formalize the concept of MPSI-CA-sum and present the first realization that ensures simultaneous practicality and security against arbitrary collusion. The computational complexity of this protocol is roughly twice that of our MPSI-CA protocol.

Besides the main results, we introduce the concepts and efficient constructions of two novel building blocks: multi-party secret-shared shuffle and multi-party oblivious zero-sum check, which may be of independent interest.

Keywords

Multi-party PSI-CA multi-party PSI-CA-sum secure multi-party computation privacy

1. Introduction

1.1. Motivation

In today’s increasingly electronic world, the easy access of enormous data brings in huge potential in serving people in many aspects. However, the problem of data leakage poses a great threat to the security of all participants. Therefore, it is urgent to strike a balance between efficiency and data privacy simultaneously during the process of collaborative computing. With the increasing dependence on data availability and the emphasis on data privacy, privacy-preserving techniques and applications are springing up at an unprecedented rate. Among them, an interesting problem is how to securely obtain the intersection of data sets from multiple parties without revealing any additional information beyond the intersection. The technique used to solve this problem is called private set intersection (PSI).

Although prior works have yielded some effective PSI schemes, only a few of them have proposed practical solutions for a variant of PSI, namely private set intersection cardinality (PSI-CA). PSI-CA is a cryptographic primitive that enables multiple parties to obtain the intersection cardinality of their private sets without leaking other information beyond the intersection cardinality. PSI-CA can be applied to real-world applications, such as measuring advertisement conversion rate [22]. In this problem setting, a user makes a purchase at the merchant’s store after seeing the advertisement for it. The advertiser and merchant each maintain a private list of users’ identifications: the advertiser knows who has seen the advertisement, and the merchant knows the identities of the users who have made a purchase. The advertisement conversion rate refers to the number of targeted users who have completed one transaction after seeing the advertisement, which can be easily determined using a PSI-CA protocol.

However, the limitation of PSI-CA is obvious, as it is still not sufficient for applications where each data is associated with an integer value (e.g. a payload). For example, consider a scenario in which a single user can contribute multiple purchases. In this case, it is necessary to analyze the total value of purchases from all targeted users, as this reflects the influence of the advertisement [22]. To address this new problem, a variant of PSI-CA has been proposed, known as private intersection-sum with cardinality (PSI-CA-sum) [22]. PSI-CA-sum is designed to output not only the intersection cardinality but also the sum of the associated payloads for all elements in the set intersection (i.e., the intersection-sum).

We have also identified a potential application for PSI-CA-sum beyond measuring advertisement conversion rate. Each voter, denoted as P_i, can cast a vote for any candidate $s \in {0, 1}^{*}$ of his preference, and assign a score to the chosen candidate s (where s represents the candidate’s ID). If voter P_i does not vote for a particular candidate s, there is no need for the voter to assign a score to s. The voting results of voter P_i are represented as a set $S_{i} = {(s_{i, 1}, v_{i} (s_{i, 1}), \dots, (s_{i, m}, v_{i} (s_{i, m})}$ of size m, where $s_{i, k}$ is the ID of the chosen candidate and $v_{i} (s_{i, k})$ is the score assigned to candidate $s_{i, k}$ , for $k \in [m]$ . Given the sets S_i for $i \in [n]$ of n voters, the set of common candidates supported by all voters is denoted as the set intersection $I S$ . The total score for a common candidate s is represented as $\sum_{i = 1}^{n} v_{i} (s)$ . In order to compute the average score of common candidates, we need two pieces of information: the intersection cardinality $| I S |$ and the total sum of scores assigned to the common candidates ${S u m}_{I S}$ (i.e., ${S u m}_{I S} = \sum_{i = 1}^{n} \sum_{x \in I S} v_{i} (x)$ ). The average score of a common candidate can be calculated as ${S u m}_{I S} / | I S |$ . In this context, PSI-CA-sum can be used to securely obtain the average score without revealing any additional sensitive information.

However, most existing PSI-CA protocols only work in the two-party setting, while the results of multi-party PSI-CA (MPSI-CA) are either limited by massive computational overhead, or vulnerable to arbitrary collusion (i.e., the adversary can corrupt any proper subset of all parties [32]). Meanwhile, to the best of our knowledge, there has been no work for multi-party PSI-CA-sum (MPSI-CA-sum), not to mention secure constructions for it. Therefore, in this paper, we propose three protocols to address the above problems.

We begin with a semi-honest secure MPSI-CA protocol involving two non-colluding parties, which is especially efficient due to the use of lightweight primitives and an optimized framework. This protocol significantly reduces the number of parties involved in expensive interactive procedures, thereby achieving a substantial improvement in runtime efficiency. In certain application scenarios where performance is a priority, this protocol stands out as an excellent choice. On the basis of it, we present the first MPSI-CA protocol that achieves simultaneous practicality and security under arbitrary collusion. This protocol primarily relies on lightweight cryptographic primitives and works more efficiently than previous benchmark schemes when dealing with large sets. Additionally, we formalize the concept of MPSI-CA-sum and propose the first practical MPSI-CA-sum protocol under arbitrary collusion. Numerical results and theoretical analysis of complexities demonstrate the performance advantages of our protocols.

1.2. Related works

PSI-CA. Prior works have yielded a number of PSI protocols by adopting various cryptographic primitives [2–4,9,14,16–20,24,26,32,36,37,40,41,43,44]. As a variant of PSI, the constructions of PSI-CA protocols share some similarities with PSI protocols but tend to be more costly. This is because PSI-CA requires preserving more private information than PSI (PSI-CA is only permitted to disclose the intersection cardinality of private sets to participants). To conceal the relationship between intersection elements and their corresponding outputs, most PSI-CA protocols typically employ a step called shuffle. The purpose of this step is to rearrange the order of elements using a random permutation unknown to the receiver.

Existing constructions of two-party PSI-CA protocols adopt a wide range of techniques. Here, we provide a brief overview of some representative works in the two-party setting. Freedman et al. [14] constructed a PSI-CA protocol based on oblivious polynomial evaluation (OPE) and homomorphic encryption (HE). They further optimized this protocol in their subsequent work [13], ensuring security against malicious adversaries under the random oracle model. Debnath et al. [6] devised a malicious secure fair two-party PSI-CA protocol in the standard model based on the Decisional Diffie-Hellman (DDH) assumption, but it requires a semi-honest arbitrator. Egert et al. [11] introduced a two-party protocol that leverages Bloom filter and the ElGamal encryption scheme. This protocol can estimate the intersection cardinality by applying the number of common zero-bits in Bloom filters into a formula obtained in [1]. Lv et al. [27] employed a commutative encryption scheme to provide a two-party PSI-CA protocol with low communication cost in the unbalanced data setting. However, these schemes heavily rely on a large number of computationally expensive modular exponentiation operations, causing inefficiency when computational resources are constrained. To accelerate the computation, it is a viable alternative to leverage lightweight primitives like oblivious transfer (OT) and its extensions as building blocks of the PSI-CA protocol. Duong et al. [10] presented “Catalic”, a delegated two-party PSI-CA protocol that enables the receiver to delegate PSI-CA computation to c untrusted cloud servers. By distributing the shares of elements to multiple cloud servers and employing an OT-based primitive called oblivious distributed key pseudorandom function (Odk-PRF), the computational burden on the receiver is significantly reduced. Garimella et.al [15] provided solutions for several private set operations, including the two-party PSI-CA. They utilize techniques such as oblivious switching network and oblivious programmable pseudorandom function (OPPRF). Since OPPRF operations can be accelerated using OT-extension [23], the number of public key operations in this protocol is independent of the set size, leading to substantial performance enhancements.

Although there have been some effective two-party PSI-CA schemes, only a limited number of works can deal with the multi-party setting [3,7,24,42]. The existing constructions of MPSI-CA protocols can be classified into three categories: public-key-based MPSI-CA, circuit-based MPSI-CA, and MPSI-CA protocols based on OT and its extensions, such as OPPRF.

Previous MPSI-CA schemes that are secure against arbitrary collusion typically follow the public-key-based paradigm. The computational complexities of these schemes are determined by the number of expensive public key operations involved. Kissner et al. [24] proposed the first MPSI-CA protocol in the semi-honest model based on OPE and HE. This protocol requires multiple rounds of randomizing the encrypted polynomials and obliviously evaluating the encrypted function outputs on each element. The overall computational complexity of [24] is $O (n^{2} m_{max}^{2})$ , where n is the number of parties and $m_{max}$ is the maximum set size. Therefore, it is too costly to implement this scheme in real-world applications with massive data and large number of participants.

Debnath et al. [7] presented an MPSI-CA protocol based on inverse Bloom filter (IBF) and HE. In their protocol, each client P_i (for $i \in [2, n]$ ) encodes its set into an IBF ( ${IBF}_{i}$ ) of length $O (k m_{max})$ , where k is the number of hash functions. The encrypted IBFs ${{IBF}_{i}}_{i \in [2, n]}$ are sent to leader P₁. Subsequently, P₁ computes the ciphertext of $\sum_{i \in [2, n]} \sum_{j \in [k]} {IBF}_{i} [h_{j} (x)]$ for each element x using homomorphic properties. P₁ then broadcasts the resulting $m_{1}$ ciphertexts to all clients for multi-party shuffling, where $m_{1}$ represents the size of the set held by P₁. Each client independently performs re-randomization and shuffling on the received ciphertexts using their own permutation, ensuring that no corrupted coalition can deduce the composite permutation. After performing group decryption on the shuffled ciphertexts, leader P₁ outputs the cardinality of set intersection. The overall computational complexity of the protocol, as stated in [7], is $O (k n m_{max})$ . It is worth noting that the value of k should be set according to the security parameter in order to control the false positive rate of IBF. The performance of the protocol remains unsatisfactory as the computational complexity still depends on both the number of participants n and the maximum set size $m_{max}$ .

The two protocols mentioned above, namely [7,24], have been proven secure under arbitrary collusion. Despite their strong privacy-preserving properties, the computational overhead of these protocols makes them impractical for resource-limited devices dealing with large data sets. To address this challenge, two practical schemes have been proposed. Chandran et al. [3] introduced a circuit-based generic multi-party computation protocol that can be adapted to achieve MPSI-CA by adjusting the circuit design. However, it is important to note that this protocol has only been proven secure in the honest majority model. Additionally, a concurrent work of [42] presented server-aided and server-less OPPRF-based MPSI-CA protocols that rely on the presence of specific non-colluding parties, deviating from the well-known “threshold security”. While leveraging specific non-colluding parties can enhance performance, it is generally believed that achieving “threshold security” aligns more closely with the requirements of most real-life applications.

Therefore, how to design and implement a practical semi-honest secure MPSI-CA scheme under arbitrary collusion is still worth studying.

PSI-CA-sum. To the best of our knowledge, there is no existing result for PSI-CA-sum in the multi-party setting. Therefore, we will only provide an overview of some known results for the two-party PSI-CA-sum. Motivated by the business problem of measuring advertisement conversion rate, Ion et al. [22] introduced the functionality of two-party PSI-CA-sum, which addresses the requirement to handle additional payloads alongside the existing PSI-CA capabilities. Ion et al. [22] introduced the first two-party PSI-CA-sum protocol by applying the classic Diffie-Hellman paradigm to this novel application. This protocol was enhanced and refined in [21], where the authors introduced two novel constructions that leverage modern techniques such as random OT. Nonetheless, it is important to highlight that these constructions still rely on costly HE as a building block for aggregating the intersection-sum. Miao et al. [28] proposed a PSI-CA-sum protocol based on shuffled distributed oblivious pseudorandom function (DOPRF), Pedersen commitments, ElGamal encryptions and Camenisch-Shoup encryptions. Niu et al. [33] introduced a privacy-preserving statistical computing protocol for the private set intersection to complete the relevant statistical computations of the intersection of two private sets, which can also be used to compute the intersection-sum. Garimella et al. [15] put forward a lightweight two-party PSI-CA protocol that manages to avoid the need for HE. This was achieved through the application of oblivious switching network and OT techniques.

1.3. Our contributions

In this paper, we begin by introducing an incredibly efficient MPSI-CA protocol with two non-colluding parties. Building upon this protocol, we further develop an MPSI-CA protocol and an MPSI-CA-sum protocol, both of which achieve simultaneous practicality and security under arbitrary collusion. Our contributions can be summarized as follows.

MPSI-CA with two non-colluding parties. We first derive an efficient MPSI-CA protocol in the semi-honest model by assuming that two specific parties are non-colluding. It admits the following properties and advantages.

By leveraging the element sharing technique, the framework of the protocol is significantly optimized, enabling only two parties to engage in expensive interactive procedures.

The computational efficiency of the protocol is enhanced through the utilization of the element sharing technique and lightweight primitives, eliminating the need for public key operations apart from a set of base OTs.

Numeric results demonstrate that computing the intersection cardinality for 15 parties with $2^{18}$ elements only takes 25.8 seconds. Additionally, the running time of this protocol is merely one-quarter of the time required by our proposed MPSI-CA protocol that is secure against arbitrary collusion. In scenarios where performance is a priority, this protocol can be regarded as an excellent choice.

MPSI-CA under arbitrary collusion. This protocol effectively eliminates the need for assuming non-colluding parties, and its efficiency can be confirmed through our numeric results.

To the best of our knowledge, this is the first practical realization of MPSI-CA under arbitrary collusion. We have conducted an implementation to verify its practicality (while the previous results under arbitrary collusion only present theoretical analysis of performance without real implementation).

Notably, clients among all participants experience the most substantial performance improvement compared to existing schemes with the same level of security.

In our implementation, a substantial portion of the expensive operations can be shifted to the offline phase, leading to a significant reduction in the running time of the online execution. Numeric results show that even in the dishonest majority setting, involving 15 parties each holding a set of size $2^{16}$ , the online computation can be completed in just 12.805 seconds, which is approximately one-fourth of the original running time.

Table 1 compares our proposed MPSI-CA protocols with current MPSI-CA schemes in terms of security and computational complexity. On one hand, when compared to existing practical schemes [3,42], our MPSI-CA protocol under arbitrary collusion (Protocol 5.1) is more secure, as [3,42] are not resistant to arbitrary collusion (remark that our protocol is also of practicality which is incomparable to the schemes in [3,42] due to different running frameworks). On the other hand, when compared to the existing schemes secure against arbitrary collusion [7,24], Protocol 5.1 is much more practical, as it adopts symmetric key operations and a set of base OTs to reduce the number of expensive public key operations.

Table 1
Comparison between MPSI-CA schemes.

MPSI-CA schemes Techniques Security model

Comparison between MPSI-CA schemes

[3] OT + Symmetric key operations Honest majority

Server-aided [42] OT + Symmetric key operations Two specific parties are non-colluding

Server-less [42] OT + Symmetric key operations Three specific parties are non-colluding

[24] HE Arbitrary collusion

[7] HE Arbitrary collusion

Our Protocol 4.2 OT + Symmetric key operations Two specific parties are non-colluding

Our Protocol 5.1 OT + Symmetric key operations Arbitrary collusion

MPSI-CA schemes	Techniques	Security model
Comparison between MPSI-CA schemes
[3]	OT + Symmetric key operations	Honest majority
Server-aided [42]	OT + Symmetric key operations	Two specific parties are non-colluding
Server-less [42]	OT + Symmetric key operations	Three specific parties are non-colluding
[24]	HE	Arbitrary collusion
[7]	HE	Arbitrary collusion
Our Protocol 4.2	OT + Symmetric key operations	Two specific parties are non-colluding
Our Protocol 5.1	OT + Symmetric key operations	Arbitrary collusion

MPSI-CA schemes	Primary leader	Secondary leader	Client	Total
Computational complexities of MPSI-CA schemes under arbitrary collusion (number of public key operations)
[24]	/	/	$O (n m_{max}^{2})$	$O (n^{2} m_{max}^{2})$
[7]	$O (m_{1})$	/	$O (k m_{max})$	$O (k n m_{max})$
Our Protocol 5.1	$O (t κ)$	$O (t κ)$	/	$O (t^{2} κ)$

MPSI-CA-sum under arbitrary collusion. We formalize the notion of MPSI-CA-sum and propose the first MPSI-CA-sum protocol that achieves simultaneous practicality and security against arbitrary collusion. Our protocol eliminates the need for expensive HE in the aggregation of intersection-sum, leading to faster computation and lower communication cost.

Additional Contributions. In addition to the main contributions, we also introduce the concepts and efficient constructions of two new building blocks of our MPSI-CA and MPSI-CA-sum protocols: multi-party secret-shared shuffle and multi-party oblivious zero-sum check.

Multi-party secret-shared shuffle is a primitive that enables multiple parties to collectively shuffle the sum of their input data using an unknown permutation π and obtain additive secret shares of the resulting shuffled data. It is an advancement over the multi-party Permute + Share [29] as it effectively conceals π even under arbitrary collusion. Additionally, our construction is practical as most of its costly operations can be moved to the offline phase, thus reducing the computational burden during online execution.

Multi-party oblivious zero-sum check is a primitive used to securely determine whether the sum of multiple parties’ inputs equals zero without revealing anything else. Our construction of the oblivious zero-sum check protocol utilizes Beaver triples to reduce online computational overhead.

1.4. High-level description

In this part, we present a high-level overview of our MPSI-CA and MPSI-CA-sum protocols. These protocols involve n parties, comprising T leaders $L_{1}, \dots, L_{T}$ and $n - T$ clients $P_{1}, \dots, P_{n - T}$ . To distinguish among the leaders, leader L₁ is called the primary leader, and the remaining leaders are referred to as secondary leaders. Each party possesses a private set of size m. For $i \in [T]$ , the data set of the i-th leader L_i is denoted as $X_{i}$ ; for $j \in [n - T]$ , the set of the j-th client P_j is represented as $S_{j}$ .

The overall framework of our MPSI-CA protocol with two non-colluding parties is illustrated in Figure 1. In this protocol, it is already sufficient to set the number of leaders (denoted as T) to 2, and designate these two non-colluding parties (denoted as L₁ and L₂) as the leaders. The whole process is divided into two main phases: element sharing and two-party PSI-CA. In the element sharing phase, clients share their PRF-encoded data sets with the leaders, allowing the leaders to hold both their own data sets and those of the clients. This reduces the original n-party MPSI-CA problem to a two-party PSI-CA with only two leaders. By adopting this approach, the framework of our protocol can be optimized since only L₁ and L₂ are required to engage in the following interactive procedures: OPPRF, two-party Permute + Share and oblivious zero-sum check. In the first step of the two-party PSI-CA, L₁ invokes OPPRFs with L₂ on each element $x_{1, k} \in X_{1}$ (for $k \in [m]$ ). According to the properties of OPPRF and element sharing, if $x_{1, k}$ belongs to the n-party set intersection, then the sum of the outputs of L₁ and L₂ on $x_{1, k}$ will equal 0, which is denoted as $t_{k}$ . Subsequently, they utilize the two-party Permute + Share to obtain random additive shares of the shuffled set ${t_{π (k)}}_{k \in [m]}$ , where the permutation π is selected by L₂. Finally, they can calculate the number of elements that satisfy $t_{π (k)} = 0$ by employing a two-party oblivious zero-sum check protocol.

Figure 1.

The overview of our efficient and non-colluding MPSI-CA protocol.

Figure 2.

The overview of our MPSI-CA and MPSI-CA-sum protocols under arbitrary collusion.

After that, we extend the aforementioned MPSI-CA protocol with two non-colluding parties to a secure MPSI-CA protocol under arbitrary collusion. As depicted in Figure 2, this extension involves the construction of two new primitives: multi-party secret-shared shuffle and multi-party oblivious zero-sum check. In this protocol, parties first go through the element sharing stage, and for security purposes, the number of leaders needs to be increased to $T = t + 1$ , where t denotes the corruption threshold (with t being up to $n - 1$ ). Next, the primary leader L₁ invokes OPPRFs on each element $x_{1, k} \in X_{1}$ (for $k \in [m]$ ) with all secondary leaders L_i (for $i \in [2, T]$ ). If $x_{1, k}$ is an intersection element, then the sum of all leaders’ outputs on $x_{1, k}$ equals 0 (i.e., $t_{k} = 0$ ). Following this, the T-party secret-shared shuffle is performed, where each leader L_i (for $i \in [2, T]$ ) obtains a random additive share of the shuffled set ${t_{π (k)}}_{k \in [m]}$ , and the permutation π remains unknown to any party. Finally, the leaders engage in a multi-party oblivious zero-sum check protocol to securely determine the number of elements that satisfy the condition $γ_{π (k)} t_{π (k)} = 0$ , where the random value $γ_{π (k)}$ is unknown to any leader. If $γ_{π (k)} t_{π (k)} = 0$ , indicating an intersection element, the primary leader L₁ increments the intersection cardinality $| I S |$ by one. Otherwise, the value of $t_{π (k)}$ will not be disclosed.

Our MPSI-CA-sum protocol extends the aforementioned MPSI-CA protocol under arbitrary collusion by introducing a key difference: leaders are now responsible for handling both payloads and elements. In this protocol, parties perform element sharing, payload sharing, OPPRF and secret-shared shuffle on both elements and their respective payloads. Following the execution of the oblivious zero-sum check on elements, the primary leader L₁ acquires a binary vector $\vec{e}$ , which indicates the shuffled indices of elements belonging to the set intersection. For these specific elements, L₁ invokes OTs with all other leaders using the choice string $\vec{e}$ to aggregate the sum of their payloads associated with the intersection elements.

1.5. Organizations

Section 2 introduces the preliminaries, providing essential background information. In Section 3, we present the notions and constructions of two new building blocks: multi-party secret-shared shuffle and multi-party oblivious zero-sum check. These building blocks are crucial for developing our secure MPSI-CA protocol under arbitrary collusion. In Section 4, we put forward a comprehensive description of our efficient MPSI-CA scheme with two non-colluding parties, which serves as the foundation for the subsequent protocols. Following that, in Section 5 and 6, we propose the practical MPSI-CA and MPSI-CA-sum protocols under arbitrary collusion, respectively. We focus on implementing and analyzing the performance of our MPSI-CA protocols in Section 7. Finally, we conclude our work in Section 8.

2. Preliminaries

Notations. We use κ and λ to represent the computational and statistical security parameters. $[x]$ denotes the set of integers ranging from 1 to x. $\sum_{i = 1}^{T}$ and $\sum_{i \in [T]}$ are two equivalent notations. If a set ${x_{1}, x_{2}, \dots, x_{m}}$ is arranged in order, it can be represented as a vector $\vec{x} = (x_{1}, x_{2}, \dots, x_{m})$ . Consequently, $\vec{x} + \vec{y}$ denotes the element-wise addition of two vectors $\vec{x}$ and $\vec{y}$ , resulting in a new vector $(x_{1} + y_{1}, \dots, x_{m} + y_{m})$ . Given a permutation π and a vector $\vec{x}$ , we can rearrange the elements in $\vec{x}$ using the permutation π to obtain a new vector $π (\vec{x}) = (x_{π (1)}, x_{π (2)}, \dots, x_{π (m)})$ . This operation is commonly referred to as shuffling. The symbol $I S$ represents the set intersection, and $| I S |$ denotes the intersection cardinality.

Security Definitions. The parties corrupted by a semi-honest adversary $A$ will faithfully follow the protocol, while attempting to learn about other parties’ inputs. Moreover, these corrupted parties will collude with each other. When referring to “non-colluding parties”, we mean that at most one of those parties can be corrupted by $A$ . On the other hand, “arbitrary collusion” suggests that $A$ may corrupt any proper subset of all parties, which is the most challenging case. The coalition of corrupted parties is denoted as $C$ . Let Π be a protocol and f be a deterministic functionality. We define the following distributions of random variables and use the real-ideal simulation paradigm to formally define the semi-honest security of Π. In this paper, we prove the security of all the protocols based on Definition 1 [12].

$R e a l_{Π} (κ, C; x_{1}, \dots, x_{n})$ : Each party P_i runs the protocol honestly using private input $x_{i}$ and security parameter κ. Output ${V_{i} | i \in C}, (y_{1}, \dots, y_{n})$ , where $V_{i}$ and $y_{i}$ denote the final view and output of party P_i.

$I d e a l_{f, S i m} (κ, C; x_{1}, \dots, x_{n})$ : Compute $(y_{1}, \dots, y_{n}) \leftarrow f (x_{1}, \dots, x_{n})$ . Output $S i m (C, {(x_{i}, y_{i}) | i \in C}), (y_{1}, \dots, y_{n})$ , where Sim is a probabilistic polynomial time (PPT) simulator.

Definition 1 ([12])

We say that protocol Π securely computes f in the presence of a semi-honest adversary, if there exists a PPT simulator Sim such that for $C$ and all inputs $x_{1}, \dots, x_{n}$ , the distributions $R e a l_{Π} (κ, C; x_{1}, \dots, x_{n})$ and $I d e a l_{f, S i m} (κ, C; x_{1}, \dots, x_{n})$ are computationally indistinguishable in κ.

Simple Hashing. The simple hashing scheme employs γ hash functions

h_{1}, \dots, h_{γ}

to map n items into b bins, denoted as

B_{1}, \dots, B_{b}

. An element x will be allocated to bins

B_{h_{1} (x)}, B_{h_{2} (x)}, \dots, B_{h_{γ} (x)}

, regardless of whether other elements have already been stored in those bins. According to the following balls-into-bins inequality [30], we can examine the maximum bin size ρ that ensures no bin will contain more than ρ items except with probability

2^{- λ}

when hashing m items into b bins.

\begin{aligned} Pr [\exists binwith ⩾ ρ items] ⩽ b [\sum_{i = ρ}^{n} (\begin{matrix} n \\ i \end{matrix}) \cdot {(\frac{1}{b})}^{i} \cdot {(1 - \frac{1}{b})}^{n - i}] \end{aligned}

(1)

Cuckoo Hashing. In the cuckoo hashing scheme, first proposed by Pagh and Rodler in [34], γ hash functions $h_{1}, \dots, h_{γ}$ are used to map n items into b bins and a stash. Unlike the simple hashing technique, cuckoo hashing guarantees that there is only one element in each bin. The scheme avoids collisions by relocating those elements that collide as follows: An element x can be mapped to any empty bin among $B_{h_{1} (x)}, B_{h_{2} (x)}, \dots, B_{h_{γ} (x)}$ . If all these bins are occupied by other elements, a bin $B_{h_{r} (x)}$ is randomly select from these γ bins, and the prior item y in $B_{h_{r} (x)}$ , where $h_{r} (x) = h_{k} (y)$ , is evicted to a new bin $B_{h_{i} (y)}$ using hash function $h_{i}, i \in [γ] ∖ {k}$ . This procedure is repeated until no more evictions are necessary, or a threshold number of evictions have been performed. In the latter case, the last item will be inserted into a stash. Empirical analysis in [38] shows that, when the number of hash functions is 3, the stash size can be reduced to 0 by setting $b = 1.28 m$ while achieving a hashing failure probability of $2^{- 40}$ .

1-out-of-2 Oblivious Transfer ( $F_{ot}$ ). $F_{ot}$ is a two-party primitive, where the sender $P_{0}$ inputs a pair of strings $(x_{0}, x_{1})$ , and the receiver P₁ chooses a random bit b. P₁ obtains nothing other than $x_{b}$ while $P_{0}$ learns nothing about b. There are many classic constructions for OT, such as [31] and [39], which all rely on expensive public key operations. To improve performance, in [23], Isha et.al introduced the technique of OT-extension to enable us to carry out many OTs based on symmetric key operations and a small number of basic OTs. Functionality 1 (1-out-of-2 Oblivious Transfer $F_{ot}$ )

Behaviour: On input $b \in {0, 1}$ from the sender, and $(x_{0}, x_{1})$ from the sender:

Give $x_{b}$ to the receiver.

Oblivious Key-Value Store (OKVS). The definitions of key-value store (KVS) and OKVS were given in [16]. An OKVS is a generalized data structure that stores the mapping from keys to their values, and it can be instantiated with polynomial, garbled bloom filter (GBF) [9], garbled cuckoo table (GCT) [16,35,41] and so on.

Definition 2 ([16])

A KVS is parameterized by a set $K$ of keys and a set $V$ of values, and consists of two algorithms:

Encode takes as input a set of $(k_{i}, v_{i})$ key-value pairs and outputs an object S (or, with statistically small probability, an error indicator ⊥);

Decode takes as input the object S, a key k and outputs a value v.

A KVS is correct if, for all

A \subseteq K \times V

with distinct keys:

\begin{aligned} (k, v) \in A and ⊥ \neq S \leftarrow Encode (A) ⟹ Decode (S, k) = v \end{aligned}

A KVS is an OKVS if, for any two sets

K^{0}

K^{1}

of m distinct keys, the output of

R (K^{1})

is computationally indistinguishable to that of

R (K^{0})

, where:

$R (K = (k_{1}, \dots, k_{m}))$

For $i \in [m]$ : choose uniform $v_{i} \leftarrow V$ ;

Return $E n c o d e ({(k_{1}, v_{1}), \dots (k_{m}, v_{m})})$ .

The obliviousness of OKVS guarantees that, if the values encoded in the OKVS are random, then the advantage of successfully guessing the corresponding key is negligible for any PPT adversary.

Oblivious Programmable Pseudorandom Function (OPPRF, $F_{opprf}^{F, m, u}$ ). The formal definition of OPPRF was first given in [26], which also provided a semi-honest secure realization for it. As shown in Functionality 2, OPPRF is a two-party primitive that is run between a sender and a receiver. The functionality $F_{opprf}^{F, m, u}$ takes as input the queries ${q_{1}, \dots, q_{u}}$ from the receiver P₁ and a programmed set $P = {⟨ x_{i}, y_{i} ⟩}_{i \in [m]}$ from the sender P₂. It consists of two algorithms:

$KeyGen (1^{κ}, P) \to (k, h i n t)$ : Given the security parameter κ and a set of points $P = {⟨ x_{1}, y_{1} ⟩, \dots, ⟨ x_{m}, y_{m} ⟩}$ with distinct $x_{i}$ -values, where $x_{i}, y_{i} \in {0, 1}^{κ}$ , generate a PRF key k and a (public) auxiliary information hint $h i n t$ that stores the information of the set $P$ .

$F (k, h i n t, q_{l}) \to y$ : Evaluate the PRF on input $q_{l}$ and output $y \in {0, 1}^{κ}$ .

Functionality 2 (OPPRF $F_{opprf}^{F, m, u}$ )

Parameters: A pseudorandom function (PRF) F; upper bound m on the number of points to be programmed, and bound u on the number of queries.

Behaviour: On input $P$ from the sender P₂ and u queries $(q_{1}, \dots, q_{u})$ from the receiver P₁, where $P = {⟨ x_{1}, y_{1} ⟩, \dots, ⟨ x_{m}, y_{m} ⟩}$ is a set of points:

Run $K e y G e n ((1^{κ}, P)) \to (k, h i n t)$ and give $(k, h i n t)$ to the sender P₂, where k is the PRF key and $h i n t$ is a (public) auxiliary information hint that stores the information of the set $P$ .

Give $(h i n t, F (k, h i n t, q_{1}), \dots, F (k, h i n t, q_{u}))$ to the receiver P₁.

For each query $q_{i}$ ( $i \in [m]$ ), the functionality $F_{opprf}^{F, m, u}$ returns an output to the receiver that satisfies the following property: if query $q_{j}$ equals some key $x_{i}$ in $P$ , then the OPPRF output on $q_{j}$ equals $y_{i}$ ; otherwise, the output is pseudorandom. Generally speaking, the receiver’s OPPRF outputs are fixed at some selected points.

Two-party Permute + Share ( $F_{2 PS}^{m}$ ). Two-party Permute + Share is an important building block of our MPSI-CA protocol with two non-colluding parties (Protocol 4.2). As is shown in Functionality 3, the sender inputs a random permutation π and the receiver inputs an a vector $\vec{x} = (x_{1}, \dots, x_{m})$ . The functionality $F_{2 PS}^{m}$ returns random additive shares of the shuffled vector $π (\vec{x}) = (x_{π (1)}, \dots, x_{π (m)})$ to both parties. These shares are denoted as $\vec{a} = (a_{1}, \dots, a_{m})$ for the receiver and $\vec{b} = (x_{π (1)} - a_{1}, \dots, x_{π (m)} - a_{m})$ for the sender.

Functionality 3 (Two-party Permute + Share $F_{2 PS}^{m}$ )

Parameters: The dimension of vector is m.

Behaviour: On input vector $\vec{x} = (x_{1}, \dots, x_{m})$ from the receiver P₁ and a permutation π from the sender P₂.

Give a shuffled share $\vec{a} = (a_{1}, \dots, a_{m})$ to P₁;

Give a shuffled share $\vec{b} = (x_{π (1)} - a_{1}, \dots, x_{π (m)} - a_{m})$ to P₂.

$F_{2 PS}^{m}$ can be instantiated using OT-extension and switching network [29], or use oblivious punctured vector (OPV) as building blocks [5]. In this paper, we choose the semi-honest secure Permute + Share scheme proposed by [29], where the number of OT invocations needed is $m \log (m)$ . Through utilizing OT-extension acceleration, the number of expensive public key operations can be reduced to $O (κ)$ , and the number of symmetric key operations is linear in $m \log (m)$ .

Multi-party Permute + Share ( $F_{mPS}^{T, m, i}$ ). Multi-party Permute + Share is an n-party primitive that is run between a sender and $n - 1$ receivers. It is not merely a trivial extension of the two-party Permute + Share ( $F_{2 PS}^{m}$ ) in the multi-party scenario. In $F_{2 PS}^{m}$ , the receiver P₁ provides an input vector, and the sender P₂ is required to provide a permutation π. In contrast, in $F_{mPS}^{T, m, i}$ , all parties are required to input their vectors, and the identity of the sender P_i (who additionally provides the permutation $π_{i}$ ) is determined by the parameter i. The remaining parties P_j for $i \in [n] ∖ {i}$ serve as receivers. As is shown in Functionality 4, $F_{mPS}^{T, m, i}$ eventually returns random additive shares of the shuffled sum $π_{i} (\sum_{j \in [T]} {\vec{x}}_{j})$ to every party. Mohassel et al. [29] previously proposed a semi-honest secure realization for $F_{mPS}^{T, m, i}$ , which will be adopted as a crucial building block for the multi-party secret-shared shuffle primitive proposed in Section 3.1.

Functionality 4 (Multi-party Permute + Share $F_{mPS}^{T, m, i}$ )

Parameters:T parties $P_{j}, j \in [T]$ ; the dimension of vector is m; the sender is P_i.

Behaviour: On input permutation $π_{i}$ and vector ${\vec{x}}_{i} = (x_{i, 1}, \dots, x_{i, m})$ from the sender P_i, and input vector ${\vec{x}}_{j} = (x_{j, 1}, \dots, x_{j, m})$ from each receiver P_j (for $j \in [T] ∖ {i}$ ):

For each $j \in [T]$ , give the shuffled share ${\vec{x}}_{j}^{'} = (x_{j, 1}^{'}, \dots, x_{j, m}^{'})$ to party P_j, where

\begin{aligned} \sum_{j \in [T]} x_{j, k}^{'} = \sum_{j \in [T]} x_{j, π_{i} (k)}, for~ k \in [m], namely \sum_{j \in [T]} {\vec{x}}_{j}^{'} = π_{i} (\sum_{j \in [T]} {\vec{x}}_{j}) . \end{aligned}

Two-party Oblivious Zero-Sum Check ( $F_{2 OZK}^{m}$ ). Two-party oblivious zero-sum check is an important building block of our MPSI-CA protocol with two non-colluding parties (Protocol 4.2). It enables a receiver and a sender to securely determine the number of zeros in the sum of their input vectors (denoted as $\vec{x}$ ). If the j-th position of the sum $\vec{x}$ equals 0, then the receiver P₁ outputs a bit $e = 1$ ; otherwise, $e = 0$ . In [15], the BaRK-OPRF protocol [25] is utilized to realize the purpose of two-party oblivious zero-sum check.

$F_{2 OZK}^{m}$ can be seen as a stepping stone towards introducing our multi-party oblivious zero-sum check primitive. However, to the best of our knowledge, extending the OPRF-based construction proposed in [15] to the multi-party setting is not feasible. Therefore, in Section 3.2, we will present a solution to the problem of multi-party oblivious zero-sum check based on the secret-sharing mechanism.

Functionality 5 (Two-party Oblivious Zero-Sum Check $F_{2 OZK}^{m}$ )

Parameters: The dimension of vector is m.

Behaviour: On input vector ${⟨ \vec{x} ⟩}_{1}$ from the receiver P₁, and input vector ${⟨ \vec{x} ⟩}_{2}$ from the sender P₂, where ${⟨ \vec{x} ⟩}_{1} + {⟨ \vec{x} ⟩}_{2} = \vec{x} = (x_{1}, \dots, x_{m})$ :

Give a binary vector $\vec{e} = (e_{1}, \dots, e_{m})$ to the receiver P₁, where $e_{k} = 1$ if the k-th position of $\vec{x}$ equals 0 (i.e., $x_{k} = 0$ ); otherwise, $e_{k} = 0$ .

3. Two new primitives and constructions

In this section, we present the notions and constructions of two new building blocks for our MPSI-CA and MPSI-CA-sum protocols: multi-party secret-shared shuffle and multi-party oblivious zero-sum check.

3.1. Multi-party secret-shared shuffle

In [5], a two-party secret-shared shuffle protocol is proposed to produce additive secret shares of the shuffled sum for two parties, ensuring that neither party knows the permutation. This protocol is similar to the two-party Permute + Share, but differs in that the two-party Permute + Share allows one party to learn the permutation.

In a non-colluding two-party setting, the semi-honest Permute + Share primitive is sufficient for randomly shuffling the sum of inputs. The receiver only obtains random shares of the shuffled sum, while the exact permutation π remains unknown to the receiver. However, this primitive is inadequate in scenarios involving multiple parties, where each corrupted party can freely collude with others. If the Permute + Share protocol [29] is employed as a component in a multi-party protocol, the secrecy of the permutation π is compromised once a corrupted sender (who selects π) colludes with someone.

To overcome this limitation, we propose a new primitive called multi-party secret-shared shuffle. This primitive enables parties to shuffle the sum of their input vectors using an unknown permutation π and obtain random additive shares of the result.

Functionality ( $F_{mSS}^{T, m}$ ). $F_{mSS}^{T, m}$ can be considered as an improvement over the multi-party Permute + Share primitive ( $F_{mPS}^{T, m, i}$ ), as it additionally ensures that none of the parties can learn the permutation π. In $F_{mSS}^{T, m}$ , each party $P_{i}, i \in [T]$ inputs their respective permutations $π_{i}$ and vectors $\vec{x_{i}}$ , and the functionality outputs random additive shares of the shuffled sum of inputs $π (\sum_{i \in [T]} {\vec{x}}_{i})$ to them.

Functionality 6 (Multi-party Secret-Shared Shuffle $F_{mSS}^{T, m}$ )

Parameters:T parties $P_{1}, \dots, P_{T}$ ; the dimension of vector is m.

Behaviour: On input permutation $π_{i}$ and vector ${\vec{x}}_{i} = (x_{i, 1}, \dots, x_{i, m})$ from each party P_i (for $i \in [T]$ ):

Give each party P_i (for $i \in [T]$ ) an additive share ${\vec{x}}_{i}^{'} = (x_{i, 1}^{'}, \dots, x_{i, m}^{'})$ , where $\sum_{i \in [T]} x_{i, k}^{'} = \sum_{i \in [T]} x_{i, π (k)}$ (for $k \in [m]$ ), namely $\sum_{i \in [T]} {\vec{x}}_{i}^{'} = π (\sum_{i \in [T]} {\vec{x}}_{i})$ . Here, permutation $π = π_{T} \circ \dots π_{2} \circ π_{1}$ is the composition of T permutations.

Protocol. We propose Protocol 3.1 to implement $F_{mSS}^{T, m}$ . This protocol employs T rounds of the T-party Permute + Share [29] in an iterative manner, as described below.

Protocol 3.1 (( $Π_{mSS}^{T, m}$ ): Multi-party Secret-Shared Shuffle)

Parameters: The number of parties is T; the dimension of input vector is m.

Input: Random permutations $π_{i} : {1, \dots, m} \to {1, \dots, m}$ and vectors ${\vec{x}}_{j}^{(0)} = (x_{i, 1}, \dots, x_{i, m})$ from all parties P_i, $i \in [T]$ . Here, ${\vec{x}}_{i}^{(0)} = {\vec{x}}_{i}$ .

Output: For each $i \in [T]$ , P_i receives an additive share of a shuffled vector $π (\sum_{j \in [T]} {\vec{x}}_{j}^{(0)})$ , where the permutation $π = π_{T} \circ \dots π_{2} \circ π_{1}$ is a composition of all T individual permutations $π_{1}, π_{2}, \dots, π_{T}$ .

Protocol:

During the i-th ( $i \in [T]$ ) round of T-party Permute + Share ( $F_{mPS}^{T, m, i}$ ), parties interact as follows:

Party P_j (for $j \in [T] ∖ {i}$ ) acts as the receiver with an input vector ${\vec{x}}_{j}^{(i - 1)}$ .

Party P_i acts as the sender with a random permutation $π_{i}$ and an input vector ${\vec{x}}_{i}^{(i - 1)}$ .

Each party P_j (for $j \in [T]$ ) obtains a random additive secret share of $π_{i} (\sum_{j \in [T]} {\vec{x_{j}}}^{(i - 1)})$ , denoted as ${\vec{x}}_{j}^{(i)}$ .

For each $i \in [T]$ , after T rounds of T-party Permute + Share, party P_i’s additive share of the shuffled sum is denoted as ${\vec{x}}_{i}^{(T)}$ .

In the i-th round of the protocol, P_i assumes the role of the sender, inputting a permutation $π_{i}$ and a vector ${\vec{x}}_{i}^{(i - 1)}$ . Concurrently, the remaining parties, denoted as P_j for $j \in [T] ∖ {i}$ , act as receivers with their respective input vectors ${\vec{x}}_{j}^{(i - 1)}$ . Initially, ${\vec{x}}_{j}^{(0)}$ is set to ${\vec{x}}_{j}$ for all $j \in [T]$ . For each $j \in [T]$ , party P_j receives an output vector ${\vec{x}}_{j}^{(i)}$ that satisfies the equation $\sum_{j \in [T]} {\vec{x}}_{j}^{(i)} = π_{i} (\sum_{j \in [T]} {\vec{x_{j}}}^{(i - 1)})$ . Party P_j then treats ${\vec{x}}_{j}^{(i)}$ as its input vector for the subsequent round. Each party P_j eventually obtains an additive share ${\vec{x}}_{j}^{(T)}$ of the shuffled sum $π (\sum_{j \in [T]} {\vec{x}}_{j})$ , where the composite permutation π is represented as $π = π_{T} \circ \dots \circ π_{1}$ . By employing the Permute + Share scheme proposed in [29], our implementation of $F_{mSS}^{T, m}$ requires a total of $O (T (T - 1) m \log m)$ OTs.

Correctness. The iterative structure of our protocol allows us to achieve the purpose of shuffling the vector $\sum_{j = 1}^{T} {\vec{x}}_{j}$ using a composite permutation $π = π_{T} \circ \dots π_{2} \circ π_{1}$ . According to definition of $F_{mPS}^{T, m, i}$ , it is straightforward to verify that the sum of all parties’ outputs satisfies Equation (2).

\begin{aligned} \sum_{j \in [T]} {\vec{x_{j}}}^{(T)} = π_{T} (\sum_{j \in [T]} {\vec{x}}_{j}^{(T - 1)}) = π_{T} (π_{T - 1} (\sum_{j \in [T]} {\vec{x}}_{j}^{(T - 2)})) = \dots = π (\sum_{j \in [T]} {\vec{x}}_{j}^{(0)}) = π (\sum_{j \in [T]} {\vec{x}}_{j}) \end{aligned}

(2)

Theorem 1.

This protocol ( $Π_{mSS}^{T, m}$ ) securely computes $F_{mSS}^{T, m}$ in the presence of a semi-honest adversary which may corrupt up to $T - 1$ parties, if $F_{mPS}^{T, m, i}$ is secure against semi-honest adversaries.

Proof. Proof.

We exhibit simulator Sim for simulating the views of the corrupted parties (i.e., $C$ ). The views of $C$ consist of their inputs, outputs, and the views during T rounds of the T-party Permute + Share. In the first round, the simulator Sim selects random vectors to simulate the outputs of the corrupted parties in $F_{mPS}^{T, m, i}$ . Since the simulated outputs and the outputs in the real world have the same distribution, they are indistinguishable from each other. Sim then treats these simulated outputs as inputs for the next round. By following the prescribed strategies for each round of the T-party Permute + Share and leveraging the simulator $S i m_{mPS}$ of the subroutine functionality $F_{mPS}^{T, m, i}$ iteratively, the views of $C$ during the execution of $Π_{mSS}^{T, m}$ can be ideally simulated by Sim. The output views produced by Sim are indistinguishable from $C$ ’s real views, which is obtained by the indistinguishability of the underlying simulator.

3.2. Multi-party oblivious zero-sum check

We present the concept and construction of a new primitive called multi-party oblivious zero-sum check. This primitive allows parties to securely identify which entries in the sum of their input vectors are zero, without disclosing the actual values of the non-zero entries. It will be employed in the final step of our MPSI-CA protocol to determine the intersection cardinality of the shuffled data.

Functionality ( $F_{OZK}^{T, m}$ ). $F_{OZK}^{T, m}$ accepts as input the additive shares ${⟨ \vec{x} ⟩}_{i}$ from each of the T parties, denoted as $P_{1}, \dots, P_{T}$ . It then outputs a binary vector $\vec{e} = (e_{1}, \dots, e_{m})$ to P₁. For each $k \in [m]$ , if the sum of the input shares $\vec{x} = \sum_{i = 1}^{T} {⟨ \vec{x} ⟩}_{i}$ has a value of 0 at position k, then $e_{k} = 1$ ; otherwise, $e_{k} = 0$ (i.e., $e_{k} = 1$ only when $x_{k} = 0$ ). In other words, $F_{OZK}^{T, m}$ ensures that P₁ cannot determine the actual value of $x_{k}$ unless it is equal to 0.

Functionality 7 (Multi-party Oblivious Zero-Sum Check $F_{OZK}^{T, m}$ )

Parameters: The number of parties is T; the dimension of input vector is m.

Behaviour: On input vector ${⟨ \vec{x} ⟩}_{i}$ from P_i (for $i \in [T]$ ), where $\sum_{i = 1}^{T} {⟨ \vec{x} ⟩}_{i} = \vec{x} = (x_{1}, \dots, x_{m})$ :

Give a binary vector $\vec{e} = (e_{1}, \dots, e_{m})$ to P₁. Specifically, $e_{k}$ is set to 1 if the k-th position of vector $\vec{x}$ equals 0 (i.e., $x_{k} = 0$ ), and $e_{k}$ is set to 0 otherwise.

Protocol. As described in Protocol 3.2, $F_{OZK}^{T, m}$ can be implemented using the secret sharing mechanism. Each party holds an additive share of the secret vector $\vec{x}$ , which allows them to compute their additive shares of the component-wise multiplication product $\vec{γ} \vec{x}$ using Beaver multiplication triples, where $\vec{γ}$ is a “negotiated” random vector. The actual value of the vector $\vec{γ}$ is kept secret to everyone, as each party P_i only knows an additive share ${⟨ \vec{γ} ⟩}_{i}$ of $\vec{γ}$ for $i \in [T]$ . If $x_{k} = 0$ , then it is clear that the k-th position of $\vec{γ} \vec{x}$ equals 0 (i.e., $γ_{k} x_{k} = 0$ ); if $x_{k} \neq 0$ , P₁ cannot infer anything about $x_{k}$ from $γ_{k} x_{k}$ due to the random value of $γ_{k}$ .

Parties need to interact with each other in order to acquire their additive shares of the component-wise multiplication product $\vec{γ} \vec{x}$ . It is important to note that $\vec{γ} \vec{x} = \sum_{i, j \in [T]} {⟨ \vec{γ} ⟩}_{i} {⟨ \vec{x} ⟩}_{j}$ can be divided into two parts: $\sum_{i \in [T]} {⟨ \vec{γ} ⟩}_{i} {⟨ \vec{x} ⟩}_{i}$ and $(T^{2} - T) / 2$ components ${⟨ \vec{γ} ⟩}_{i} {⟨ \vec{x} ⟩}_{j} + {⟨ \vec{γ} ⟩}_{j} {⟨ \vec{x} ⟩}_{i}$ , where $i < j \in [T]$ . For each component ${⟨ \vec{γ} ⟩}_{i} {⟨ \vec{x} ⟩}_{j} + {⟨ \vec{γ} ⟩}_{j} {⟨ \vec{x} ⟩}_{i}$ , P_i and P_j can securely obtain their additive shares $\vec{{s h}_{0}^{i, j}}$ and $\vec{{s h}_{1}^{i, j}}$ of this component using Beaver triples by following Protocol 3.2. The process of online pairwise secret-shared multiplication will be significantly expedited by consuming the prepared Beaver triples from the setup stage. Finally, P_i computes the sum of ${⟨ \vec{γ} ⟩}_{i} {⟨ \vec{x} ⟩}_{i}$ and its $T - 1$ shares of $\sum_{j \in [T] ∖ {i}} ({⟨ \vec{γ} ⟩}_{i} {⟨ \vec{x} ⟩}_{j} + {⟨ \vec{γ} ⟩}_{j} {⟨ \vec{x} ⟩}_{i})$ to obtain $\vec{{s h}_{i}} = \sum_{j \in [i - 1]} \vec{{s h}_{1}^{j, i}} + \sum_{j \in [i + 1, T]} \vec{{s h}_{0}^{i, j}} + {⟨ \vec{γ} ⟩}_{i} {⟨ \vec{x} ⟩}_{i}$ . Then, P_i sends $\vec{{s h}_{i}}$ to P₁ so that P₁ can reconstruct those shares to obtain the randomized sum as $\vec{γ} \vec{x}$ . If the k-th position of $\vec{γ} \vec{x}$ equals 0, P₁ sets $e_{k}$ to 1; otherwise, $e_{k}$ is set to 0.

Correctness. The property of Beaver triples guarantees that for any $i < j$ in the set $[T]$ , the following equation holds: $\vec{{s h}_{0}^{i, j}} + \vec{{s h}_{1}^{i, j}} = {⟨ \vec{γ} ⟩}_{i} {⟨ \vec{x} ⟩}_{j} + {⟨ \vec{γ} ⟩}_{j} {⟨ \vec{x} ⟩}_{i}$ . Therefore, according to Equation (3), the sum of all parties’ shares yields the correct result $\vec{γ} \vec{x}$ .

For each $k \in [m]$ , if $x_{k} = 0$ , then $γ_{k} x_{k} = 0$ . If $x_{k} \neq 0$ , since $γ_{k}$ is a random value, the probability that $γ_{k} x_{k}$ happens to be 0 is $2^{- l}$ , where l is the length of $γ_{k}$ . By setting $l ⩾ λ + \log (m) + 2$ , we can ensure that the false positive rate is negligible in the security parameter λ.

\begin{aligned} \sum_{i \in [T]} \vec{{s h}_{i}} = \sum_{i \in [T]} ({⟨ \vec{γ} ⟩}_{i} {⟨ \vec{x} ⟩}_{i} + \sum_{j \in [i - 1]} \vec{{s h}_{1}^{j, i}} + \sum_{j \in [i + 1, T]} \vec{{s h}_{0}^{i, j}}) = \underset{Locally Computed}{\underset{⏟}{\sum_{i \in [T]} {⟨ \vec{γ} ⟩}_{i} {⟨ \vec{x} ⟩}_{i}}} + \underset{Secret~Shared}{\underset{⏟}{\sum_{i, j \in [T], i \neq j} {⟨ \vec{γ} ⟩}_{i} {⟨ \vec{x} ⟩}_{j}}} = \vec{γ} \vec{x} \end{aligned}

(3)

Protocol 3.2 ((

Π_{OZK}^{T, m}

): Multi-party Oblivious Zero-Sum Check)

Parameters: The number of parties is T; the dimension of input vector is m.

Initialization: For every two parties P_i and P_j, $i, j \in [T]$ , $i < j$ , they prepare enough Beaver triples ${⟨ \vec{a} ⟩}_{0}$ , ${⟨ \vec{b} ⟩}_{0}$ , ${⟨ \vec{c} ⟩}_{0}$ and ${⟨ \vec{a} ⟩}_{1}$ , ${⟨ \vec{b} ⟩}_{1}$ , ${⟨ \vec{c} ⟩}_{1}$ for online share-based multiplication, where $\vec{c} = \vec{a} \vec{b}$ , $\vec{c} = {⟨ \vec{c} ⟩}_{0} + {⟨ \vec{c} ⟩}_{1}$ , $\vec{a} = {⟨ \vec{a} ⟩}_{0} + {⟨ \vec{a} ⟩}_{1}$ and $\vec{b} = {⟨ \vec{b} ⟩}_{0} + {⟨ \vec{b} ⟩}_{1}$ . Note that P_i holds ${⟨ \vec{a} ⟩}_{0}$ , ${⟨ \vec{b} ⟩}_{0}$ , ${⟨ \vec{c} ⟩}_{0}$ , and P_j holds ${⟨ \vec{a} ⟩}_{1}$ , ${⟨ \vec{b} ⟩}_{1}$ , ${⟨ \vec{c} ⟩}_{1}$ . $\vec{a}$ and $\vec{b}$ are kept secret to both parties.

Input: Additive share ${⟨ \vec{x} ⟩}_{i}$ from party P_i (for $i \in [T]$ ), where $\vec{x} = (x_{1}, \dots, x_{m}) = \sum_{i = 1}^{T} {⟨ \vec{x} ⟩}_{i}$ .

Output:P₁ outputs a binary vector $\vec{e} = (e_{1}, \dots, e_{m})$ : for $k \in [m]$ , if $x_{k} = 0$ , then $e_{k} = 1$ ; otherwise, $e_{k} = 0$ .

Protocol:

For each $i \in [T]$ , party P_i randomizes its share ${⟨ \vec{x} ⟩}_{i}$ as follows:

P_i locally selects a random vector ${⟨ \vec{γ} ⟩}_{i}$ .

(Pairwise Multiplication)P_i computes its additive share of $\vec{γ} \vec{x} = \sum_{u, l \in [T]} {⟨ \vec{γ} ⟩}_{u} {⟨ \vec{x} ⟩}_{l}$ , where $\vec{γ} = \sum_{i = 1}^{T} {⟨ \vec{γ} ⟩}_{i}$ . For each component ${⟨ \vec{γ} ⟩}_{i} {⟨ \vec{x} ⟩}_{j} + {⟨ \vec{γ} ⟩}_{j} {⟨ \vec{x} ⟩}_{i}$ ( $j \in [i + 1, T]$ ), parties P_i and P_j obtain their respective shares of this component as follows:

P_i computes ${⟨ \vec{α} ⟩}_{0} = {⟨ \vec{x} ⟩}_{i} - {⟨ \vec{a} ⟩}_{0}$ and ${⟨ \vec{β} ⟩}_{0} = {⟨ \vec{γ} ⟩}_{i} - {⟨ \vec{b} ⟩}_{0}$ , then announces them to P_j; P_j computes ${⟨ \vec{α} ⟩}_{1} = {⟨ \vec{x} ⟩}_{j} - {⟨ \vec{a} ⟩}_{1}$ and ${⟨ \vec{β} ⟩}_{1} = {⟨ \vec{γ} ⟩}_{j} - {⟨ \vec{b} ⟩}_{1}$ , then announces them to P_i.

P_i reconstructs $\vec{α}$ and $\vec{β}$ , computes its additive share of ${⟨ \vec{γ} ⟩}_{i} {⟨ \vec{x} ⟩}_{j} + {⟨ \vec{γ} ⟩}_{j} {⟨ \vec{x} ⟩}_{i}$ as $\vec{{s h}_{0}^{i, j}} = {⟨ \vec{c} ⟩}_{0} + \vec{α} {⟨ \vec{b} ⟩}_{0} + \vec{β} {⟨ \vec{a} ⟩}_{0} + \vec{α} \vec{β} - {⟨ \vec{γ} ⟩}_{i} {⟨ \vec{x} ⟩}_{i}$ . P_j obtains its additive share of ${⟨ \vec{γ} ⟩}_{i} {⟨ \vec{x} ⟩}_{j} + {⟨ \vec{γ} ⟩}_{j} {⟨ \vec{x} ⟩}_{i}$ as $\vec{{s h}_{1}^{i, j}} = {⟨ \vec{c} ⟩}_{1} + \vec{α} {⟨ \vec{b} ⟩}_{1} + \vec{β} {⟨ \vec{a} ⟩}_{1} - {⟨ \vec{γ} ⟩}_{j} {⟨ \vec{x} ⟩}_{j}$ .

(Reconstruction) Each party P_i (for $i \in [2, T]$ ) computes the sum of ${⟨ \vec{γ} ⟩}_{i} {⟨ \vec{x} ⟩}_{i}$ and its $T - 1$ shares of the sum $\sum_{j = 1, j \neq i}^{T} ({⟨ \vec{γ} ⟩}_{i} {⟨ \vec{x} ⟩}_{j} + {⟨ \vec{γ} ⟩}_{j} {⟨ \vec{x} ⟩}_{i})$ obtained in step 1(a). The result is denoted as $\vec{{s h}_{i}}$ , which is then sent to P₁. Upon receiving the shares $\vec{{s h}_{2}}, \dots, \vec{{s h}_{T}}$ , P₁ combines these shares to reconstruct the product $\vec{γ} \vec{x} = \sum_{i \in [T]} \vec{{s h}_{i}}$ . For each $k \in [m]$ , if the k-th entry of the vector $\vec{γ} \vec{x}$ is 0, P₁ sets $e_{k}$ to be 1; otherwise, $e_{k}$ is set to 0.

Theorem 2.
Protocol 3.2( $Π_{OZK}^{T, m}$ ) securely computes $F_{OZK}^{T, m}$ in the presence of a semi-honest adversary which may corrupt up to $T - 1$ parties.
Proof. Proof.

The parties are divided into two coalitions: the corrupted coalition $C$ and the honest coalition $H$ . We present a simulator Sim for simulating the view of $C$ . The security proof is typically divided into two cases based on whether or not P₁ is corrupted. First, let us consider the trivial case where $P_{1} \notin C$ . For the corrupted party P_i, Sim selects random vectors to simulate its received shares from the honest party P_j in step 1 (b), which are indistinguishable due to the security of secret sharing. In the case where $P_{1} \in C$ , Sim forwards the input vectors to $F_{OZK}^{T, m}$ to obtain the output binary vector $\vec{e}$ of P₁. For positions in the binary vector $\vec{e}$ where $e_{k} = 0$ , all generated and received shares of $γ_{k} x_{k}$ are indistinguishable from uniformly random values; for positions where $e_{k} = 1$ , shares of $γ_{k} x_{k}$ can be simulated by choosing random values that sum to zero.

4. MPSI-CA with two non-colluding leaders

We begin with an efficient MPSI-CA protocol with two non-colluding parties, referred to as L₁ and L₂. The remaining parties are represented as $P_{1}, \dots, P_{n - 2}$ . To clarify, the adversary can corrupt any subset of either ${L_{1}, {P_{i}}_{i \in [n - 2]}}$ or ${L_{2}, {P_{i}}_{i \in [n - 2]}}$ . This protocol necessitates only $T = 2$ leaders, denoted as L₁ and L₂, to ensure security. This assumption is reasonable considering that L₁ could be a reputable organization under public supervision. Although the risk of collusion is minimal, it is conceivable that L₁ might still have an interest in the private information of other parties.

The introduction of this MPSI-CA protocol with two non-colluding parties serves two primary purposes. Firstly, it establishes a foundation for an MPSI-CA protocol that is secure against arbitrary collusion (detailed in Section 5.1). Secondly, it is tailored for specific application scenarios where performance takes precedence. This protocol demonstrates how to compute the intersection cardinality in an exceptionally lightweight manner by providing a weaker security guarantee. This balance between security and efficiency allows for faster computation and reduced communication costs, all while guaranteeing the desired outcomes.

In this section, we begin by revisiting the functionality of MPSI-CA. Subsequently, we introduce a step called element sharing, which aims to reduce the original n-party MPSI-CA to a T-party MPSI-CA involving only T leaders. Finally, we provide a detailed description of our MPSI-CA protocol.

Functionality ( $F_{MPSI - CA}$ ). MPSI-CA allows n parties, each with a private set of m items, to learn the intersection cardinality of their private sets without revealing anything else.

Functionality 8 (MPSI-CA $F_{MPSI-CA}$ )

Parameters:T leaders $L_{1}, \dots, L_{T}$ ; $n - T$ clients $P_{1}, \dots, P_{n - T}$ ; the set size is m.

Behaviour: On input data sets $X_{i}$ from all leaders L_i (for $i \in [T]$ ), and data sets $S_{j}$ from all clients P_j (for $j \in [n - T]$ ):

Give leader L₁ the intersection cardinality $| I S | = | (⋂_{i = 1}^{T} X_{i}) \cap (⋂_{j = 1}^{n - T} S_{j}) |$ .

High-level Description. The fundamental idea of our protocol is to share the PRF-encoded data sets of clients with $T = 2$ leaders L₁ and L₂ (as described in Sub-protocol 4.1), and to subsequently run a semi-honest secure two-party PSI-CA protocol between these two leaders. The two-party PSI-CA can be instantiated various previous schemes; in this case, we adapt and modify the scheme proposed in [15] to address this problem.

4.1. Element sharing

Considering the fact that the overhead of the MPSI-CA protocol tends to increase with the number of parties, it is a logical approach to designate only a small number of parties, known as leaders, to carry out the costly interactive procedures. This is achieved by sharing the PRF-encoded data sets of the remaining parties with the leaders during the first step. This technique, first adopted by [32], is referred to as “element sharing” in this paper.

The functionality of element sharing is as follows: for each element x in the data set $X_{i}$ , the leader L_i holds a corresponding random value $q_{i} (x)$ . We refer to $q_{i} (x)$ as L_i’s element sharing of x. If x belongs to the set intersection of all parties, denoted as $I S$ , then the sum of all the leaders’ element sharing of x is equal to zero. In other words, if $x \in I S$ , then $\sum_{i \in [T]} q_{i} (x) = 0$ .

Protocol. The detailed process of element sharing is illustrated in Sub-protocol 4.1. Initially, each client P_j (for $j \in [n - T]$ ) distributes a total of $T - 1$ random PRF keys $K_{j, 2}, \dots, K_{j, T}$ to the secondary leaders $L_{2}, \dots, L_{T}$ . P_j then encrypts all of its elements using these PRF keys and combines the PRF outputs to obtain the ciphertext for each element. Subsequently, P_j encodes all the element-ciphertext pairs into an OKVS, which is a generalized data structure that stores the mapping from data to their corresponding ciphertexts. Finally, the client sends the OKVS to the primary leader L₁. Upon completing this step, the client’s task is finished.

Each secondary leader L_i, for $i \in [2, T]$ , receives a total of $n - T$ PRF keys from the clients. L_i encrypts each of its element x using these keys and calculates the sum of the resulting $n - T$ ciphertexts to obtain its element sharing $q_{i} (x)$ of x. On the other hand, the primary leader L₁ does not receive PRF keys; instead, L₁ receives $n - T$ OKVSs from the clients. For each element x in $X_{1}$ , L₁ decodes x on all the received OKVSs and then calculates the sum of the $n - T$ decode outputs to obtain its element sharing $q_{1} (x)$ of x.

The correctness proof of Sub-protocol 4.1 is as follows. If the element x belongs to $I S$ , then each leader L_i, for $i \in [T]$ , holds an element sharing $q_{i} (x)$ of x. Additionally, this implies that for each client $P_{j}, j \in [n - T]$ , the key-value pair $⟨ x, \sum_{i = 2}^{T} F (K_{j, i}, x) ⟩$ is encoded in the OKVS D_j. Therefore, L₁’s element sharing of x can be expressed as $q_{1} (x) = - \sum_{j = 1}^{n - T} \sum_{i = 2}^{T} F (K_{j, i}, x)$ . During the computation of $\sum_{i = 1}^{T} q_{i} (x)$ , we note that each PRF key $K_{i, j}$ is used twice on the same item x: once by client P_j and once by leader L_i. Consequently, the two PRF outputs will cancel each other out, resulting in $\sum_{i = 1}^{T} q_{i} (x) = 0$ .

If $x \notin I S$ , it can be further divided into two cases: either x does not belong to the set $X_{i}$ of some leader L_i, or it does not belong to the set $S_{j}$ of some client P_j. In the first case, it directly implies that leader L_i does not hold any element sharing of x. In the second case, it means that x is not encoded in the OKVS D_j. In such a situation, when L₁ decodes x on D_j, L₁ will obtain an independent random value. As a result, the probability that $\sum_{i = 1}^{T} q_{i} (x)$ coincidentally equals zero when $x \notin I S$ is negligible.

Sub-protocol 4.1 (Element Sharing)

Parameters: The number of parties is n, number of leaders is T; set size is m.

Input: $X_{i} = {x_{i, 1}, \dots, x_{i, m}}$ from leader L_i (for $i \in [T]$ ); $S_{j} = {s_{j, 1}, \dots, s_{j, m}}$ from client P_j (for $j \in [n - T]$ ).

Protocol:

(Client) For each $j \in [n - T]$ , client P_j acts as follows:

P_j sends a random $PRF$ key $K_{j, i}$ to each secondary leader L_i for $i \in [2, T]$ .

For each element $s_{j, k} \in S_{j}$ ( $k \in [m]$ ), P_j computes the PRF-encoded value of $s_{i, j}$ as $\sum_{i = 2}^{T} PRF (K_{j, i}, s_{j, k})$ . Then, P_j encodes the key-value pairs ${⟨ s_{j, k}, \sum_{i = 2}^{T} PRF (K_{j, i}, s_{j, k}) ⟩}_{k \in [m]}$ into an OKVS D_j and sends D_j to the primary leader L₁.

(Primary Leader) For each element $x_{1, k} \in X_{1}$ ( $k \in [m]$ ), L₁ decodes $x_{1, k}$ on all its received OKVSs ${D_{j}}_{j \in [n - T]}$ to get ${Decode (D_{j}, x_{1, k})}_{j \in [n - T]}$ , and then obtains its element sharing of $x_{1, k}$ as $q_{1} (x_{1, k}) = - \sum_{j = 1}^{n - T} Decode (D_{j}, x_{1, k})$ .

(Secondary Leader) Each secondary leader L_i (for $i \in [2, T]$ ) computes the PRF outputs of all the elements $x_{i, k}$ in the set $X_{i}$ (for $k \in [m]$ ) using its $n - T$ received keys ${K_{j, i}}_{j \in [n - T]}$ . Then, L_i sums the resulting $n - T$ PRF outputs of $x_{i, k}$ to obtain its element sharing of $x_{i, k}$ as $q_{i} (x_{i, k}) = \sum_{j = 1}^{n - T} PRF (K_{j, i}, x_{i, k})$ .

4.2. Detailed protocol

In this part, we demonstrate how the element sharing technique can be integrated with other primitives to construct a secure MPSI-CA protocol. The formal description of this protocol in the balanced data setting is provided in Protocol 4.2. In scenarios with unbalanced data set sizes, we find it advantageous to assign the role of the primary leader to the party with the fewest data. This strategy further reduces the overhead of set intersection cardinality computation.

In the beginning, the parties perform element sharing to reduce the original n-party MPSI-CA problem to a two-party PSI-CA problem between two leaders, L₁ and L₂. Subsequently, L₁ and L₂ apply the bucketing technique to insert their sets $X_{1}$ and $X_{2}$ into two hash tables, ${T a b l e}_{1}$ and ${T a b l e}_{2}$ , each with $b = 1.28 m$ bins. L₁ employs cuckoo hashing, while L₂ uses simple hashing for their respective hash tables. Each empty bin in the cuckoo table ${T a b l e}_{1}$ is padded with a dummy element. According to the results presented in [38], when the number of hash functions is set to three, the stash size of the cuckoo table can be reduced to zero by setting $b = 1.28 m$ , while maintaining a hashing failure probability of $2^{- 40}$ . Through bucketing, L₁ and L₂ can perform membership test on only a small number of elements in separate bins, rather than the entire data set. If the bucketing technique is not adopted, L₁ would need to compare each element $x \in X_{1}$ with all elements in $X_{2}$ to determine if x belongs to $X_{2}$ , which would result in high communication and computational costs. In contrast, when the bucketing technique is applied, if x is mapped to the k-th bin using cuckoo hashing with hash function $h_{1}$ , L₁ only needs to compare x with at most ρ elements from ${T a b l e}_{2} [k]$ , where ρ is the maximal bin size. This optimization arises from the fact that if $x \in X_{2}$ (assuming there exists an element $y \in X_{2}$ such that $y = x$ ), both x and y will be mapped to the same bin k by $h_{1}$ , obviating the need to check whether x belongs to any other bins in ${T a b l e}_{2}$ .

Based on the properties of element sharing and OPPRF, we can ascertain if an element in ${T a b l e}_{1}$ , denoted as ${T a b l e}_{1} [k]$ , belongs to the set intersection $I S$ . If ${T a b l e}_{1} [k] \in {T a b l e}_{2} [k]$ , then L₁ receives the programmed output $r_{1, k} = q_{2} ({T a b l e}_{1} [k]) - t_{2, k}$ , where $q_{2} ({T a b l e}_{1} [k])$ represents L₂’s element sharing of ${T a b l e}_{1} [k]$ . Otherwise, $r_{1, k}$ is a pseudorandom value. Subsequently, L₁ combines $r_{1, k}$ with $q_{1} ({T a b l e}_{1} [k])$ to compute $t_{1, k} = r_{1, k} + q_{1} ({T a b l e}_{1} [k])$ . Moreover, if the input element ${T a b l e}_{1} [k] \in I S$ , then $t_{1, k} = q_{2} ({T a b l e}_{1} [k]) - t_{2, k} + q_{1} ({T a b l e}_{1} [k]) = - t_{2, k}$ . Otherwise, $t_{1, k}$ will be a pseudorandom value. The problem of computing the cardinality of the set intersection is thus reduced to counting the number of zeros in the vector $\vec{t_{1}} + \vec{t_{2}}$ . An intuitive approach would be for L₂ to send its vector $\vec{t_{2}}$ to L₁, which would allow L₁ to construct $\vec{t_{1}} + \vec{t_{2}}$ and count the number of zeros. However, this method would reveal the intersection elements. If L₁ observes that the k-th entry in $\vec{t_{1}} + \vec{t_{2}}$ (namely, $t_{1, k} + t_{2, k}$ ) equals 0, L₁ learns that the element ${T a b l e}_{1} [k]$ belongs to the set intersection.

In order to prevent the leakage of intersection elements, L₁ and L₂ are required to utilize the two-party Permute + Share protocol and the two-party oblivious zero-sum check protocol. The two-party Permute + Share primitive provides them with random shares of the shuffled vector $π (\vec{t_{1}})$ . Subsequently, they use the two-party oblivious zero-sum check to determine the number of zeros in the sum of the shuffled vectors $π (\vec{t_{1}}) + π (\vec{t_{2}})$ . Given that L₁ and L₂ are assumed to be non-colluding, L₁ cannot gain knowledge of the permutation π used in the shuffling stage. As a result, even if L₁ notes that $t_{1, π (k)} + t_{2, π (k)} = 0$ , it cannot deduce the original index k. Consequently, L₁ is unable to correlate $π (k)$ with the element ${T a b l e}_{1} [k]$ .

The two-party oblivious zero-sum check protocol $F_{2 OZK}^{b}$ is a pivotal step in the process. Suppose that, after the shuffling stage, L₂ were to send $\vec{c} + π ({\vec{t}}_{2})$ directly to L₁ without employing $F_{2 OZK}^{b}$ . In such a scenario, L₁ could reconstruct $π (\vec{t_{1}} + \vec{t_{2}})$ and output the number of zeros as $| I S |$ . While this approach might seem straightforward, it is actually vulnerable to the following collusion attack. Consider a scenario where the corrupted coalition $C$ consists of L₁ and all the clients $P_{1}, \dots, P_{n - 2}$ . In this case, L₂ is honest. The problem arises if there exists an item ${T a b l e}_{1} [k]$ in $I S ∖ {S_{1}}$ .

Protocol 4.2 (MPSI-CA with two non-colluding leaders)

Parameters: The set size is m; the number of leaders is $T = 2$ ; hash functions $h_{1}$ , $h_{2}$ , $h_{3}$ ; the number of bins is b.

Input: $X_{i} = {x_{i, 1}, \dots, x_{i, m}}$ from leader L_i (for $i \in [2]$ ); $S_{j} = {s_{j, 1}, \dots, s_{j, m}}$ from client P_j (for $j \in [n - 2]$ ).

Protocol:

(Element Sharing) Parties perform element sharing by following Sub-protocol 4.1. For each $k \in [m]$ , leader L_i obtains its element its element sharing of $x_{i, k}$ as $q_{i} (x_{i, k})$ .

(Two-party PSI-CA) Leader L₁ and L₂ act as follows.

(Bucketing)L₁ does ${T a b l e}_{1} \leftarrow {CuckooHash}_{h_{1}, h_{2}, h_{3}}^{b} (X_{1})$ , L₂ does ${T a b l e}_{2} \leftarrow {SimpleHash}_{h_{1}, h_{2}, h_{3}}^{b} (X_{2})$ .

(OPPRF)L₁ invokes $F_{opprf}^{F, 3 m, b}$ with L₂:

Sender L₂ provides a programmed set $P = {P_{k}}_{k \in [b]}$ , where subset $P_{k} = {⟨ x, q_{2} (x) - t_{2, k} ⟩}_{x \in {T a b l e}_{2} [k]}$ stores the key-value pairs for the k-th bin ${T a b l e}_{2} [k]$ . Here, $t_{2, k}$ is a random value.

Receiver L₁ provides b queries ${{T a b l e}_{1} [k]}_{k \in [b]}$ , and outputs $\vec{r_{1}} = (r_{1, 1}, \dots, r_{1, b})$ , where $r_{1, k}$ is the OPPRF output on ${T a b l e}_{1} [k]$ .

For each $k \in [b]$ , L₁ computes the sum of $r_{1, k}$ and $q_{1} ({T a b l e}_{1} [k])$ to obtain $t_{1, k} = r_{1, k} + q_{1} ({T a b l e}_{1} [k])$ . The output vector is denoted as ${\vec{t}}_{1} = (t_{1, 1}, \dots, t_{1, b})$ .

(Two-party Permute + Share)L₁ and L₂ engage in $F_{2 PS}^{b}$ :

L₂ acts as the sender with an random permutation π;

L₁ acts as the receiver with the input vector ${\vec{t}}_{1} = (t_{1, 1}, \dots, t_{1, b})$ ;

L₁ and L₂ receive random additive shares of the shuffled vector $π ({\vec{t}}_{1}) = (t_{1, π (1)}, \dots, t_{1, π (b)})$ . These shares are $\vec{a} = (a_{1}, \dots, a_{b})$ and $\vec{c} = (c_{1}, \dots, c_{b})$ , respectively. Here, $\vec{a} + \vec{c} = π ({\vec{t}}_{1})$ , and for each $k \in [b]$ , $a_{π (k)} + c_{π (k)} = t_{1, π (k)}$ .

L₂ first applies π to the vector ${\vec{t}}_{2} = (t_{2, 1}, \dots, t_{2, b})$ to obtain $π ({\vec{t}}_{2})$ . Subsequently, L₂ computes the sum $\vec{c} + π ({\vec{t}}_{2})$ .

(Two-party OZK)L₁ and L₂ jointly invoke $F_{2 OZK}^{b}$ to securely evaluates the number of entries in $\vec{a} + π ({\vec{t}}_{2}) + \vec{c}$ that are equal to 0.

L₂ acts as the sender with the input vector $\vec{c} + π ({\vec{t}}_{2})$ .

L₁ acts as the receiver with the input vector $\vec{a}$ . L₁ outputs a binary vector $\vec{e}$ that indicates which positions in $\vec{a} + π ({\vec{t}}_{2}) + \vec{c}$ equal 0.

L₁ outputs the number of 1s in

\vec{e}

, which represents the intersection cardinality

| I S |

Even though the OKVS sent by P₁ is not encoded on the key ${T a b l e}_{1} [k]$ , we point out that L₁ can still non-interactively deduce what its correct element sharing of ${T a b l e}_{1} [k]$ would be through collusion. The correct element sharing is denoted as $q_{1}^{'} ({T a b l e}_{1} [k]) = - \sum_{j \in [n - T]} PRF (K_{j, 2}, {T a b l e}_{1} [k])$ . Building on the characteristics of element sharing and OPPRF, we have $q_{1}^{'} ({T a b l e}_{1} [k]) + q_{2} ({T a b l e}_{1} [k]) = 0$ and $t_{1, k} = q_{2} ({T a b l e}_{1} [k]) - t_{2, k} + q_{1} ({T a b l e}_{1} [k])$ . This leads to the equation $(t_{1, k} + t_{2, k}) + (- q_{1} ({T a b l e}_{1} [k]) + q_{1}^{'} ({T a b l e}_{1} [k])) = 0$ . After reconstructing the shuffled vector $π (\vec{t_{1}} + \vec{t_{2}})$ , L₁ adds $- q_{1} ({T a b l e}_{1} [k]) + q_{1}^{'} ({T a b l e}_{1} [k])$ to each position of $π (\vec{t_{1}} + \vec{t_{2}})$ to check whether the result equals 0. If any result is zero, it indicates that the element ${T a b l e}_{1} [k]$ belongs to L₂’s set $X_{2}$ , thus revealing additional information beyond the set intersection cardinality $| I S |$ . Conversely, when our protocol is enhanced with the $F_{2 OZK}^{b}$ mechanism, L₁ does not have access to the vector $π (\vec{t_{1}} + \vec{t_{2}})$ , thereby enhancing the protocol’s resilience against the potential collusion attack previously mentioned.

Correctness. The correctness of our protocol is easy to prove.

Case 1: If the element ${T a b l e}_{1} [k]$ belongs to $I S$ , then according to the property of element sharing, we have $q_{2} ({T a b l e}_{1} [k]) + q_{1} ({T a b l e}_{1} [k]) = 0$ . This implies that $t_{1, k} = q_{2} ({T a b l e}_{1} [k]) - t_{2, k} + q_{1} ({T a b l e}_{1} [k]) = - t_{2, k}$ . Based on the definition of $F_{2 PS}^{b}$ , both leaders L₁ and L₂ receive additive shares of the shuffled vector $π ({\vec{t}}_{1})$ , denoted as $\vec{a} = (a_{1}, \dots, a_{b})$ and $\vec{c} = (c_{1}, \dots, c_{b})$ , respectively. If $t_{1, k} = - t_{2, k}$ , then it follows that $t_{1, π (k)} = - t_{2, π (k)}$ . Consequently, L₂ holds $c_{π (k)} + t_{2, π (k)} = t_{1, π (k)} - a_{π (k)} - t_{1, π (k)} + = - a_{π (k)}$ . By leveraging the property of $F_{2 OZK}^{b}$ , L₁ can reconstruct $γ_{π (k)} (a_{π (k)} + c_{π (k)} + t_{2, π (k)}) = 0$ , which indicates that ${T a b l e}_{1} [k]$ belongs to the n-party set intersection $I S$ .

Case 2: If the element ${T a b l e}_{1} [k]$ does not belong to $X_{2}$ , then $t_{1, k} = r_{1, k} + q_{1} ({T a b l e}_{1} [k])$ is a pseudorandom value. Consequently, when the length of the PRF output is set to $l = λ + \log (m) + 2$ , the probability that $t_{1, π (k)} + t_{2, π (k)} = 0$ is negligible in the security parameter λ.

Case 3: Assuming without loss of generality that for some $j \in [n - 2]$ , the element ${T a b l e}_{1} [k]$ is not contained in the set $S_{j}$ . This assumption indicates that the OKVS D_j has not been decoded on the element ${T a b l e}_{1} [k]$ , leading to the conclusion that the decode output, $Decode (D_{j}, {T a b l e}_{1} [k])$ , will be a pseudorandom value. Therefore, the probability that the sum $q_{2} ({T a b l e}_{1} [k]) + q_{1} ({T a b l e}_{1} [k])$ equals zero is negligible in the security parameter. This also implies that the probability of $t_{1, π (k)} + t_{2, π (k)} = 0$ is negligible.

Security. Regarding the security of Protocol 4.2, we have the following theorem.

Theorem 3.
Protocol 4.2 securely computes $F_{MPSI - CA}$ in the presence of a semi-honest adversary who may corrupt any subset of either ${L_{1}, {P_{i}}_{i \in [n - 2]}}$ or ${L_{2}, {P_{i}}_{i \in [n - 2]}}$ , if $F_{opprf}^{F, 3 m, b}$ , $F_{2 PS}^{b}$ and $F_{2 OZK}^{b}$ are secure against semi-honest adversaries.
Proof. Proof.

The security proof can be divided into three cases. We construct a simulator Sim to simulate the views of the corrupted parties $C$ , and prove the indistinguishability of the simulated views from the views obtained in the real execution.

Case 1: ( $L_{1} \notin C$ and $L_{2} \notin C$ ) We first consider the trival case where the corrupted coalition $C$ consists of at most t clients, where t is the corruption threshold ( $t < n$ ). In this trivial case, the views of these corrupted clients can be easily simulated since they only participate in the element sharing stage and receive no messages from others.

Case 2: ( $L_{2} \in C$ and $L_{1} \notin C$ ) Suppose that $L_{2} \in C$ , while the primary leader L₁ is not corrupted. In this scenario, $C$ has no knowledge of the intersection cardinality $| I S |$ . The views of the corrupted clients can be simulated in a manner similar to that described in Case 1. For the corrupted secondary leader L₂, Sim simulates its view by employing the following strategies:

In step 1, Sim randomly selects $n - 2$ PRF keys ${K_{j, 2}^{'}}_{j \in [n - 2]}$ to simulate the PRF keys ${K_{j, 2}}_{j \in [n - 2]}$ received from the clients. Sim then constructs the element sharings ${q_{2} (x_{2, k})}_{k \in [m]}$ according to step 3 of Sub-protocol 4.1.

In step 2(b), Sim randomly selects $(k_{1}^{'}, \dots, k_{b}^{'})$ to simulate the output PRF keys of L₂. Sim generates the programmed set $P$ and hint $h i n t$ by following step 2(b) of Protocol 4.2. Subsequently, Sim invokes the OPPRF simulator and appends the output $S i m_{opprf} (P, {(k_{1}^{'}, \dots, k_{b}^{'}), h i n t})$ to the view.

In step 2(d), Sim selects a random vector $\vec{c^{'}} = (c_{1}^{'}, \dots, c_{b}^{'})$ to simulate the output vector of L₂ in $F_{2 PS}^{b}$ . Then, Sim invokes the simulator of $F_{2 PS}^{b}$ and appends the output $S i m_{2 PS} (π, \vec{c^{'}})$ to the view.

In step 2(f), Sim invokes the simulator of $F_{2 OZK}^{b}$ and appends the output $S i m_{2 OZK} (\vec{c^{'}} + π (\vec{t_{2}}))$ to the view.

We argue that the outputs of Sim are indistinguishable from the real view of L₂ by the following hybrids:

${Hyb}_{0}$ : L₂’s view in the real execution.

${Hyb}_{1}$ : Same as ${Hyb}_{0}$ except that we use ${K_{j, 2}^{'}}_{j \in [n - 2]}$ to substitute the PRF keys ${K_{j, 2}}_{j \in [n - 2]}$ that L₂ receives during the element sharing stage. Since they are of the same distribution, ${Hyb}_{1}$ and ${Hyb}_{0}$ are indistinguishable.

${Hyb}_{2}$ : Same as ${Hyb}_{1}$ except that the output PRF keys of L₂ in $F_{opprf}^{F, 3 m, b}$ are replaced by $(k_{1}^{'}, \dots, k_{b}^{'})$ . Since they have the same distribution, ${Hyb}_{2}$ and ${Hyb}_{1}$ are indistinguishable.

${Hyb}_{3}$ : Same as ${Hyb}_{2}$ except that Sim runs $S i m_{opprf} (P, {(k_{1}^{'}, \dots, k_{b}^{'}), h i n t})$ to produce the simulated view for L₂. The security of protocol $Π_{opprf}^{F, 3 m, b}$ guarantees that the simulated view is computationally indistinguishable from the real view in the real execution. ${Hyb}_{3}$ and ${Hyb}_{2}$ are indistinguishable.

${Hyb}_{4}$ : Same as ${Hyb}_{3}$ except that the output of $F_{2 PS}^{b}$ is replaced by $\vec{c^{'}} = (c_{1}^{'}, \dots, c_{b}^{'})$ , which is chosen by Sim. Since they are of the same distribution, ${Hyb}_{4}$ and ${Hyb}_{3}$ are indistinguishable.

${Hyb}_{5}$ : Same as ${Hyb}_{4}$ except that Sim runs $S i m_{2 PS} (π, \vec{c^{'}})$ to produce the simulated view for L₂. The security of protocol $Π_{2 PS}^{b}$ guarantees that the simulated view is computationally indistinguishable from the view of L₂ in the real execution. Consequently, ${Hyb}_{5}$ and ${Hyb}_{4}$ are indistinguishable.

${Hyb}_{6}$ : Same as ${Hyb}_{5}$ except that Sim runs $S i m_{2 OZK} (\vec{c^{'}} + π (\vec{t_{2}}))$ to produce the simulated view for L₂. The security of protocol $Π_{2 OZK}^{b}$ guarantees that the simulated view is computationally indistinguishable from the real view in the real execution. Consequently, ${Hyb}_{6}$ and ${Hyb}_{5}$ are indistinguishable.

Since

{Hyb}_{6}

is exactly the output of

S i m (X_{2})

, the simulated view of L₂ is computationally indistinguishable from its real view.

Case 3: ( $L_{1} \in C$ and $L_{2} \notin C$ ) In this case, as L₁ is corrupted, $C$ obtains the intersection cardinality $| I S |$ . The views of the corrupted clients can be simulated in a manner similar to that described in Case 1. Sim simulates the view of the corrupted primary leader L₁ by employing the following strategies:

In step 1, Sim simulates the OKVS D_j from an honest party P_j by generating an OKVS $D_{j}^{'}$ that encodes m random key-value pairs. Then, Sim generates the element sharings ${q_{1} (x_{1, k})}_{k \in [m]}$ by following step 2 of Sub-protocol 4.1.

In step 2(b), Sim randomly selects $(r_{1, 1}^{'}, \dots, r_{1, b}^{'})$ to simulate the output of L₁ in $F_{opprf}^{F, 3 m, b}$ . Then, Sim invokes the simulator of $F_{opprf}^{F, 3 m, b}$ and appends the output $S i m_{opprf} ({{T a b l e}_{1} [k]}_{k \in [b]}, {r_{1, k}^{'}}_{k \in [b]})$ to the view. Sim creates $\vec{t_{1}}$ by following step 2(c) of Protocol 4.2.

In step 2(d), Sim randomly selects a vector $\vec{a^{'}} = (a_{1}^{'}, \dots, a_{b}^{'})$ to simulate the output of L₁ in $F_{2 PS}^{b}$ . Then, Sim invokes the simulator of $F_{2 PS}^{b}$ and appends the output $S i m_{2 PS} (\vec{t_{1}}, \vec{a^{'}})$ to the view.

In step 2(f), Sim samples a binary vector $\vec{e^{'}}$ with $| I S |$ ones and fills the remaining entries with zeros. Then, Sim selects a random permutation $π^{'}$ and utilizes $π^{'} (\vec{e^{'}})$ to simulate the output of L₁ in $F_{2 OZK}^{b}$ . Subsequently, Sim invokes the simulator of $F_{2 OZK}^{b}$ and appends the output $S i m_{2 OZK} (\vec{a^{'}}, π^{'} (\vec{e^{'}}))$ to the view.

We argue that the outputs of Sim are indistinguishable from the real view by the following hybrids:

${Hyb}_{0}$ : L₁’s view in the real execution.

${Hyb}_{1}$ : Same as ${Hyb}_{0}$ except that the OKVS D_j (received by L₁ from an honest client P_j in the element sharing stage) is replaced by an OKVS $D_{j}^{'}$ that encodes m random key-value pairs. For the corrupted L₁, D_j appears random to it because all the values encoded in D_j are encrypted using the $n - 2$ PRF keys of client P_j. Since L₁ and L₂ are non-colluding, L₁ has no knowledge of these PRF keys. Consequently, the obliviousness property of OKVS and the pseudorandomness of PRF guarantee that the simulated OKVS $D_{j}^{'}$ is computationally indistinguishable from the real OKVS D_j in the real execution. Therefore, ${Hyb}_{1}$ and ${Hyb}_{0}$ are indistinguishable.

${Hyb}_{2}$ : Same as ${Hyb}_{1}$ except that the output of L₁ in $F_{opprf}^{F, 3 m, b}$ is replaced by randomly selected values $(r_{1, 1}^{'}, \dots, r_{1, b}^{'})$ . The security of protocol $Π_{opprf}^{F, 3 m, b}$ guarantees that ${Hyb}_{2}$ and ${Hyb}_{1}$ are computationally indistinguishable.

${Hyb}_{3}$ : Same as ${Hyb}_{2}$ except that Sim runs $S i m_{opprf} ({{T a b l e}_{1} [k]}_{k \in [b]}, {r_{1, k}^{'}}_{k \in [b]})$ to produce the simulated view for L₁. The security of protocol $Π_{opprf}^{F, 3 m, b}$ guarantees that the simulated view is computationally indistinguishable from the real view in the real execution. Therefore, ${Hyb}_{3}$ and ${Hyb}_{2}$ are indistinguishable.

${Hyb}_{4}$ : Same as ${Hyb}_{3}$ except that the output of L₁ in $F_{2 PS}^{b}$ (i.e., $\vec{a}$ ) is replaced by the vector $\vec{a^{'}} = (a_{1}^{'}, \dots, a_{b}^{'})$ . Since they are of the same distribution, ${Hyb}_{4}$ and ${Hyb}_{3}$ are indistinguishable.

${Hyb}_{5}$ : Same as ${Hyb}_{4}$ except that Sim runs $S i m_{2 PS} (\vec{t_{1}}, \vec{a^{'}})$ to produce the simulated view for L₁. The security of protocol $Π_{2 PS}^{b}$ guarantees that the simulated view is computationally indistinguishable from the real view in the real execution. Therefore, ${Hyb}_{5}$ is indistinguishable from ${Hyb}_{4}$ .

${Hyb}_{6}$ : Same as ${Hyb}_{5}$ except that the output of L₁ in $F_{2 OZK}^{b}$ (i.e., $\vec{e}$ ) is replaced by a uniformly binary vector $π^{'} (\vec{e^{'}})$ with $| I S |$ ones. The uniformly distributed permutation $π^{'}$ guarantees that $π^{'} (\vec{e^{'}})$ has the same distribution as $\vec{e}$ and satisfies the correctness constraint. Therefore, ${Hyb}_{6}$ is indistinguishable from ${Hyb}_{5}$ .

${Hyb}_{7}$ : Same as ${Hyb}_{6}$ except that Sim runs $S i m_{2 OZK} (\vec{a^{'}}, π^{'} (\vec{e^{'}}))$ to produce the simulated view for L₁. The security of protocol $Π_{2 OZK}^{b}$ guarantees that the simulated view is computationally indistinguishable from the real view in the real execution. Therefore, ${Hyb}_{7}$ is indistinguishable from ${Hyb}_{6}$ .

Since

{Hyb}_{7}

is exactly the output of

S i m (X_{1}, | I S |)

, the simulated view of L₁ is computationally indistinguishable from its real view.

5. MPSI-CA protocol under arbitrary collusion

The MPSI-CA protocol described in Section 4.2 optimize its computational and communication efficiency by relying on the assumption of two non-colluding parties. Limiting the number of leaders to only two during the OT-based PSI-CA computation stage significantly enhances its efficiency. However, we recognize that in some real-world applications, the assumption of non-collusion may not always be feasible. Therefore, it is essential to adapt Protocol 4.2 to address scenarios with potential arbitrary collusion. This adaptation increases the versatility of our MPSI-CA protocol, making it applicable to situations where the existence of non-colluding parties cannot be ensured.

Our non-colluding MPSI-CA protocol, presented in Section 4.2, serve as the basis for our MPSI-CA protocol under arbitrary collusion. In this section, we outline the major adjustments made to adapt to scenarios with arbitrary collusion and then provide a comprehensive description of our enhanced protocol.

5.1. Detailed description

The detailed MPSI-CA protocol under arbitrary collusion is presented in Protocol 5.1. The framework of it is similar to that of Protocol 4.2, with two major adjustments:

The number of leaders should be set to $T = t + 1$ instead of just 2, where t is the corruption threshold. If $T ⩽ t$ , for example, if $T = t$ , the protocol is vulnerable to the collusion attack in the worst scenario where all the leaders are corrupted. In such a case, given the received PRF keys and OKVSs, $C$ can easily compute the set intersection of those honest clients by traversing all possible elements. However, if $T = t + 1$ , it is guaranteed that at least one of the leaders is honest. If $L_{1} \notin C$ , $C$ can only see the PRF keys without learning about the OKVSs from the honest parties. On the other hand, if $L_{1} \in C$ but some $L_{i} \notin C$ ( $i \in [2, T]$ ), then the PRF key $K_{j, i}$ of an honest party P_j remains secret, ensuring that the ciphertexts encoded in the OKVS D_j are indistinguishable from random.

During the shuffling stage, the two-party Permute + Share protocol is substituted with a T-party secret-shared shuffle protocol. If we continue to use the T-party Permute + Share protocol [29] to help parties shuffle the sum of their inputs, the permutation π will be revealed to $C$ if the sender L_i (the party who chooses π) is corrupted. Therefore, under arbitrary collusion, the random permutation π used in the shuffling stage should be jointly negotiated by all the leaders, while ensuring that none of them can learn π, even through collusion. This can be achieved using our T-party secret-shared shuffle protocol.

Based on the above analysis, we propose the MPSI-CA protocol under arbitrary collusion (Protocol 5.1). This protocol consists of several steps, including element sharing, bucketing, OPPRF, T-party secret-shared shuffle and T-party oblivious zero-sum check.

After performing element sharing in step 1, the problem of n-party MPSI-CA computation is reduced to a problem involving only T leaders. Each leader now possesses a set of element sharings ${q_{i} (x_{i, 1}), \dots, q_{i} (x_{i, m})}$ . In order to identify which elements from its own set $X_{1}$ belong to the set intersection $I S$ , L₁ invokes $F_{opprf}^{F, 3 m, b}$ with every secondary leader L_i for $i \in [2, T]$ . According to the properties of element sharing and OPPRF, if an element ${T a b l e}_{1} [k]$ belongs to $I S$ , then leaders hold additive shares of 0 (i.e., $t_{k} = \sum_{i \in [T]} t_{i, k} = 0$ ). To determine the number of elements satisfying $t_{k} = 0$ without revealing the index k, the leaders jointly invoke $F_{mSS}^{T, m}$ to obtain their additive shares of the shuffled sum $t_{π (k)}$ , where π is unknown to anyone. Then they engage in $F_{OZK}^{T, m}$ to securely aggregate and re-randomize the value of $t_{π (k)}$ . Based on the correctness proof of our protocol $Π_{OZK}^{T, m}$ , when $t_{π (k)} = 0$ , the output $γ_{π (k)} t_{π (k)}$ equals 0. Additionally, the false positive rate is negligible in the security parameter. Therefore, L₁ can count the number of ones in $\vec{e}$ to obtain the set intersection cardinality $| I S |$ .

Correctness. The correctness analysis follows a similar approach as Protocol 4.2. If element ${T a b l e}_{1} [k]$ belongs to $I S$ , the properties of element sharing and OPPRF yield $\sum_{i \in [T]} q_{i} ({T a b l e}_{1} [k]) = 0$ and $r_{i, k} = q_{i} ({T a b l e}_{1} [k]) - t_{i, k}$ , which implies that $t_{k} = 0$ . According to the correctness analysis of our multi-party secret-shared shuffle and oblivious zero-sum check primitives, L₁ successfully reconstructs $γ_{π (k)} t_{π (k)} = 0$ , thus learning about the existence of another element in $I S$ . On the other hand, without loss of generality, if ${T a b l e}_{1} [k]$ is not found in some $X_{i}$ ( $i \in [2, T]$ ) or not in some $S_{j}$ ( $j \in [n - T]$ ), it follows that either the OPPRF output $r_{i, k}$ or the OKVS decode output $Decode (D_{j}, {T a b l e}_{1} [k])$ is a pseudorandom value. Therefore, the probability of encountering an element ${T a b l e}_{1} [k] \notin I S$ such that $γ_{π (k)} t_{π (k)} = 0$ is negligible.

Protocol 5.1 (MPSI-CA Under Arbitrary Collusion)

Parameters: The set size is m; the number of leaders is $T = t + 1$ ; hash functions $h_{1}$ , $h_{2}$ , $h_{3}$ ; the number of bins is b.

Input: $X_{i} = {x_{i, 1}, \dots, x_{i, m}}$ from leader L_i (for $i \in [T]$ ); $S_{j} = {s_{j, 1}, \dots, s_{j, m}}$ from client P_j (for $j \in [n - T]$ ).

Protocol:

(Element sharing) Parties perform element sharing by following Sub-protocol 4.1. For each $k \in [m]$ , leader L_i obtains its element sharing of $x_{i, k}$ as $q_{i} (x_{i, k})$ .

(T-party MPSI-CA) Leaders L_i (for $i \in [T]$ ) act as follows:

(Bucketing)L₁ does ${T a b l e}_{1} \leftarrow {CuckooHash}_{h_{1}, h_{2}, h_{3}}^{b} (X_{1})$ . L_i does ${T a b l e}_{i} \leftarrow {SimpleHash}_{h_{1}, h_{2}, h_{3}}^{b} (X_{i})$ , $i \in [2, T]$ .

(OPPRF)L₁ invokes $F_{opprf}^{F, 3 m, b}$ with every secondary leader L_i (for $i \in [2, T]$ ):

Sender L_i provides a programmed set $P = {P_{k}}_{k \in [b]}$ , where subset $P_{k} = {⟨ x, q_{i} (x) - t_{i, k} ⟩}_{x \in {T a b l e}_{i} [k]}$ stores the key-value pairs for the k-th bin ${T a b l e}_{i} [k]$ . Here, $t_{i, k}$ is a random value.

Receiver L₁ provides b queries ${{T a b l e}_{1} [k]}_{k \in [b]}$ , and outputs $\vec{r_{i}} = (r_{i, 1}, \dots, r_{i, b})$ , where $r_{i, k}$ is the OPPRF output on ${T a b l e}_{1} [k]$ .

For each $k \in [b]$ , L₁ computes $t_{1, k} = q_{1} ({T a b l e}_{1} [k]) + \sum_{i = 2}^{T} r_{i, k}$ .

(T-party Shuffle) Leaders L_i (for $i \in [T]$ ) jointly invoke $F_{mSS}^{T, b}$ .

Each L_i inputs the vector ${\vec{t}}_{i} = (t_{i, 1}, \dots, t_{i, b})$ and a permutation $π_{i}$ , then outputs an additive share $\vec{t_{i}^{'}}$ of the shuffled sum $π (\vec{t})$ (i.e., $\sum_{i = 1}^{T} \vec{t_{i}^{'}} = π (\vec{t})$ ). Here, $\vec{t} = \sum_{i = 1}^{T} {\vec{t}}_{i} = (t_{1}, \dots, t_{b})$ and $π = π_{T} \circ \dots \circ π_{1}$ .

(Multi-party OZK) Leaders L_i (for $i \in [T]$ ) jointly engage in $F_{OZK}^{T, b}$ to securely calculate the number of zeros in the vector $\sum_{i = 1}^{T} \vec{t_{i}^{'}}$ .

For $i \in [T]$ , leader L_i inputs its share $\vec{t_{i}^{'}}$ (obtained in step 2(d)).

L₁ outputs a binary vector $\vec{e}$ that indicates which positions in $\sum_{i = 1}^{T} \vec{t_{i}^{'}}$ equal 0. If the k-th position is 0, then $e_{k} = 1$ ; otherwise, $e_{k} = 0$ .

L₁ outputs the number of 1s in

\vec{e}

, which represents the intersection cardinality

| I S |

Security. Regarding the security of Protocol 5.1, we have the following theorem.

Theorem 4.
Protocol 5.1 securely computes $F_{MPSI - CA}$ in the presence of a semi-honest adversary which may corrupt up to t parties $(t < n)$ , if $F_{opprf}^{F, 3 m, b}$ , $F_{mSS}^{T, b}$ and $F_{OZK}^{T, b}$ are secure against semi-honest adversaries.
Proof. Proof.

The security proof can be divided into three cases. We construct a simulator Sim to simulate the views of the corrupted parties $C$ , and argue the indistinguishability of the simulated views from the real views in the real execution.

Case 1: ( $L_{i} \notin C$ holds for every $i \in [T]$ ) In this trivial case, the corrupted coalition $C$ consists of at most t corrupted parties. Their views can be easily simulated since they do not receive any messages from others.

Case 2: ( $L_{1} \notin C$ ) In this case, the corrupted coalition $C$ may comprise certain clients and secondary leaders. The views of the corrupted clients can be simulated in a manner similar to that described in Case 1. For the corrupted secondary leader L_i, Sim simulates its view by employing the following strategies:

In step 1, Sim randomly selects $n - T$ PRF keys ${K_{j, i}^{'}}_{j \in [n - T]}$ to simulate the PRF keys ${K_{j, i}}_{j \in [n - T]}$ that L_i receives from clients. Sim then constructs the element sharings ${q_{i} (x_{i, k})}_{k \in [m]}$ according to step 3 of Sub-protocol 4.1.

In step 2(b), Sim randomly selects $(k_{1}^{'}, \dots, k_{b}^{'})$ to simulate the output PRF keys of L_i. Sim generates the programmed set $P$ and hint $h i n t$ by following step 2(b) of Protocol 5.1. Then, Sim invokes the OPPRF simulator and appends the output $S i m_{opprf} (P, {(k_{1}^{'}, \dots, k_{b}^{'}), h i n t})$ to the view.

In step 2(d), Sim randomly selects vector $\vec{t_{i}^{″}} = (t_{i, 1}^{″}, \dots, t_{i, b}^{″})$ to simulate the output of L_i in $F_{mSS}^{T, b}$ . Consequently, Sim invokes the simulator of $F_{mSS}^{T, b}$ and appends the output $S i m_{mSS} ({\vec{t_{i}}, π_{i}}, \vec{t_{i}^{″}})$ to the view.

In step 2(e), Sim invokes the simulator of $F_{OZK}^{T, b}$ and appends the output $S i m_{OZK} (\vec{t_{i}^{″}})$ to the view.

We argue that the outputs of Sim are indistinguishable from the real view of L_i by the following hybrids:

${Hyb}_{0}$ : L_i’s view in the real execution.

${Hyb}_{1}$ : Same as ${Hyb}_{0}$ except that we use $K_{j, i}^{'}, j \in [n - T]$ to substitute the PRF keys $K_{j, i}, j \in [n - T]$ that L_i receives in the element sharing stage. Since they are of the same distribution, ${Hyb}_{1}$ and ${Hyb}_{0}$ are indistinguishable.

${Hyb}_{2}$ : Same as ${Hyb}_{1}$ except that the output PRF keys of L_i in $F_{opprf}^{F, 3 m, b}$ are replaced by $(k_{1}^{'}, \dots, k_{b}^{'})$ . Since they are of the same distribution, ${Hyb}_{2}$ and ${Hyb}_{1}$ are indistinguishable.

${Hyb}_{3}$ : Same as ${Hyb}_{2}$ except that Sim runs $S i m_{opprf} (P, {(k_{1}^{'}, \dots, k_{b}^{'}), h i n t})$ to produce the simulated view for L_i. The security of protocol $Π_{opprf}^{F, 3 m, b}$ guarantees that the simulated view is computationally indistinguishable from the real view in the real execution. Therefore, ${Hyb}_{3}$ and ${Hyb}_{2}$ are indistinguishable.

${Hyb}_{4}$ : Same as ${Hyb}_{3}$ except that the output of $F_{mSS}^{T, b}$ is replaced by $\vec{t_{i}^{″}} = (t_{i, 1}^{″}, \dots, t_{i, b}^{″})$ , which is chosen by Sim. Since they are of the same distribution, ${Hyb}_{4}$ and ${Hyb}_{3}$ are indistinguishable.

${Hyb}_{5}$ : Same as ${Hyb}_{4}$ except that Sim runs $S i m_{mSS} ({\vec{t_{i}}, π_{i}}, \vec{t_{i}^{″}})$ to produce the simulated view for L_i. The security of protocol $Π_{mSS}^{T, b}$ guarantees that the simulated view is computationally indistinguishable from the view of L_i in the real execution. Therefore, ${Hyb}_{5}$ and ${Hyb}_{4}$ are indistinguishable.

${Hyb}_{6}$ : Same as ${Hyb}_{5}$ except that Sim runs $S i m_{OZK} (\vec{t_{i}^{″}})$ to produce the simulated view for L₂. The security of protocol $Π_{OZK}^{T, b}$ guarantees that the simulated view is computationally indistinguishable from the real view in the real execution. ${Hyb}_{6}$ and ${Hyb}_{5}$ are indistinguishable.

Since

{Hyb}_{6}

is exactly the output of

S i m (X_{i})

, the simulated view of L_i is computationally indistinguishable from the real view.

Case 3: ( $L_{1} \in C$ ) In this case, as L₁ is corrupted, $C$ obtains the intersection cardinality $| I S |$ . The views of the corrupted clients and secondary leaders can be simulated in a manner similar to that described in Case 1 and Case 2. As for the corrupted L₁, the simulator Sim simulates its view as follows.

In step 1, Sim simulates the OKVS D_j from an honest party P_j by generating an OKVS $D_{j}^{'}$ that encodes m random key-value pairs. Then, Sim creates the element sharings ${q_{1} (x_{1, k})}_{k \in [m]}$ by following step 2 of Sub-protocol 4.1.

In step 2(b), Sim randomly selects $\vec{r_{1}^{'}} = (r_{1, 1}^{'}, \dots, r_{1, b}^{'})$ to simulate the output of L₁ in $F_{opprf}^{F, 3 m, b}$ . Then, Sim invokes the simulator of $F_{opprf}^{F, 3 m, b}$ and appends the output $S i m_{opprf} ({{T a b l e}_{1} [k]}_{k \in [b]}, {r_{1, k}^{'}}_{k \in [b]})$ to the view. Sim creates $\vec{t_{1}}$ by following step 2(c) of Protocol 5.1.

In step 2(d), Sim randomly selects $\vec{t_{1}^{″}} = (t_{1, 1}^{″}, \dots, t_{1, b}^{″})$ to simulate the output of L₁ in $F_{mSS}^{T, b}$ . Then, Sim invokes the simulator of $F_{mSS}^{T, b}$ and appends the output $S i m_{mSS} ({\vec{t_{1}}, π_{1}}, \vec{t_{1}^{″}})$ to the view.

Sim samples a binary vector $\vec{e^{'}}$ with $| I S |$ ones and fills the remaining entries with zeros. Then, Sim selects a random permutation $π^{'}$ and uses $π^{'} (\vec{e^{'}})$ to simulate the output of L₁ in $F_{OZK}^{T, b}$ . Subsequently, Sim invokes the simulator of $F_{OZK}^{T, b}$ and appends the output $S i m_{OZK} (\vec{t_{1}^{″}}, π^{'} (\vec{e^{'}}))$ to the view.

We argue that the outputs of Sim are indistinguishable from the real view of corrupted L₁ by the following hybrids:

${Hyb}_{0}$ : L₁’s view in the real execution.

${Hyb}_{1}$ : Same as ${Hyb}_{0}$ except that the OKVS D_j (received by L₁ from an honest party P_j in the element sharing stage) is replaced by an OKVS $D_{j}^{'}$ that encodes m random key-value pairs. In the real world, all the values encoded in D_j are encrypted using $n - T$ PRF keys ${K_{j, i}}_{i \in [2, T]}$ . Since $T = t + 1$ , at least one of these $n - T$ PRF keys remains secret to L₁. Therefore, from the perspective of L₁, D_j appears random. The obliviousness of OKVS and the pseudorandomness of PRF guarantee that the simulated OKVS $D_{j}^{'}$ is computationally indistinguishable from D_j in the real execution. Consequently, ${Hyb}_{1}$ and ${Hyb}_{0}$ are indistinguishable.

${Hyb}_{3}$ : Same as ${Hyb}_{2}$ except Sim runs $S i m_{opprf} ({{T a b l e}_{1} [k]}_{k \in [b]}, {r_{1, k}^{'}}_{k \in [b]})$ to produce the simulated view for L₁. The security of protocol $Π_{opprf}^{F, 3 m, b}$ guarantees that the simulated view is computationally indistinguishable from the real view in the real execution. Therefore, ${Hyb}_{3}$ and ${Hyb}_{2}$ are indistinguishable.

${Hyb}_{4}$ : Same as ${Hyb}_{3}$ except that the output of L₁ in $F_{mSS}^{T, b}$ is replaced by $\vec{t_{1}^{″}} = (t_{1, 1}^{″}, \dots, t_{1, b}^{″})$ , which has the same distribution as the original output. Therefore, ${Hyb}_{4}$ and ${Hyb}_{3}$ are indistinguishable.

${Hyb}_{5}$ : Same as ${Hyb}_{4}$ except that Sim runs $S i m_{mSS} ({\vec{t_{1}}, π_{1}}, \vec{t_{1}^{″}})$ to produce the simulated view for L₁. The security of protocol $Π_{mSS}^{T, b}$ guarantees that the simulated view is computationally indistinguishable from the real view in the real execution. Therefore, ${Hyb}_{5}$ and ${Hyb}_{4}$ are indistinguishable.

${Hyb}_{6}$ : Same as ${Hyb}_{5}$ except that the output of L₁ in $F_{OZK}^{T, b}$ (i.e., $\vec{e}$ ) is replaced by a uniformly binary vector $π^{'} (\vec{e^{'}})$ with $| I S |$ ones. The uniformly distributed permutation $π^{'}$ guarantees that $π^{'} (\vec{e^{'}})$ has the same distribution with $\vec{e}$ and satisfies the correctness constraint. Therefore, ${Hyb}_{6}$ and ${Hyb}_{5}$ are indistinguishable.

${Hyb}_{7}$ : Same as ${Hyb}_{6}$ except that Sim runs $S i m_{OZK} (\vec{t_{1}^{″}}, π^{'} (\vec{e^{'}}))$ to produce the simulated view for L₁. The security of protocol $Π_{OZK}^{T, b}$ guarantees that the simulated view is computationally indistinguishable from the real view in the real execution. ${Hyb}_{7}$ and ${Hyb}_{6}$ are indistinguishable.

Since

{Hyb}_{7}

is exactly the output of

S i m (X_{1}, | I S |)

, the simulated view of L₁ is computationally indistinguishable from its real view.

5.2. Necessity of oblivious zero-sum check

We emphasize that the step of oblivious zero-sum check is necessary in Protocol 5.1. Consider an alternative approach where, after step 2(d), the secondary leaders send their shares of $π (\vec{t})$ to L₁ directly without invoking $F_{OZK}^{T, m}$ . In this scenario, L₁ could reconstruct $π (\vec{t})$ and output the number of zeros in $π (\vec{t})$ as $| I S |$ . While this intuitive solution may seem feasible, it is actually vulnerable to a collusion attack. Suppose both L₁ and some L_i ( $i \in [2, T]$ ) belong to the corruption coalition $C$ . The problem arises when there exists an item ${T a b l e}_{1} [k]$ in $I S ∖ {X_{i}}$ . Although the corrupted leader L_i does not possess the item ${T a b l e}_{1} [k]$ , L₁ can still non-interactively deduce what its correct element sharing of ${T a b l e}_{1} [k]$ would be, denoted as $q_{i} ({T a b l e}_{1} [k]) = - \sum_{j \in [n - T]} PRF (K_{j, i}, {T a b l e}_{1} [k])$ . This information could be revealed to L₁ through collusion. After reconstructing the shuffled vector $π (\vec{t})$ , L₁ adds $- r_{i, k} - q_{i} ({T a b l e}_{1} [k])$ to each position of the vector $π (\vec{t})$ to determine if the resulting sum equals zero. If the result is zero, L₁ infers that ${T a b l e}_{1} [k]$ belongs to $I S ∖ {X_{i}}$ , thus revealing additional information beyond the set intersection $I S$ . On the other hand, when our protocol is enhanced with the $F_{OZK}^{T, m}$ mechanism, it becomes resistant to the aforementioned collusion attack. This is because the reconstructed shuffled vector $π (\vec{t}) \vec{γ}$ is randomized using an unknown vector $\vec{γ}$ .

6. MPSI-CA-sum protocol under arbitrary collusion

In this section, we first provide the formal definition of the concept of MPSI-CA-sum. Subsequently, we introduce a technique called payload sharing, which enables clients to securely share their payloads with T leaders. Following this, we extend the MPSI-CA protocol under abitrary collusion (Protocol 5.1) to provide a practical MPSI-CA-sum protocol that is secure against arbitrary collusion.

Functionality ( $F_{MPSI - CA - sum}$ ). To the best of our knowledge, we are the first to formalize the notion of MPSI-CA-sum. The functionality of MPSI-CA-sum is an extension of the two-party PSI-CA-sum proposed in [22] to a multi-party context. Another distinctive feature of MPSI-CA-sum is that it enables every party to provide their respective payloads.

In the context of MPSI-CA-sum, each element x is associated with a payload. On the side of the leader L_i, the payload is denoted as $v_{i} (x)$ , while on the side of the client P_j, it is denoted as $w_{j} (x)$ . Functionality 10 specifies that the objective of MPSI-CA-sum is to securely compute and output both the intersection cardinality $| I S |$ and the intersection-sum ${S u m}_{I S}$ .

Functionality 9 (MPSI-CA-sum $F_{MPSI - CA - sum}$ )

Parameters:T leaders $L_{1}, \dots, L_{T}$ ; $n - T$ clients $P_{1}, \dots, P_{n - T}$ ; the set size is m.

Behaviour: On input data set $X_{i} = {x_{i, 1}, \dots, x_{i, m}}$ and payload set $V_{i} = {v_{i} (x_{i, 1}), \dots v_{i} (x_{i, m})}$ from leader L_i (for $i \in [T]$ ); data set $S_{j} = {s_{j, 1}, \dots s_{j, m}}$ and payload set $W_{j} = {w_{j} (s_{j, 1}), \dots w_{j} (s_{j, m})}$ from client P_j (for $j \in [n - T]$ ):

Give output $(| I S |, {S u m}_{I S})$ to L₁, where the intersection cardinality $| I S |$ is represented as

\begin{aligned} | I S | = | (⋂_{i = 1}^{T} X_{i}) \cap (⋂_{j = 1}^{n - T} S_{j}) |, \end{aligned}

and the intersection-sum

{S u m}_{I S}

is represented as

\begin{aligned} {S u m}_{I S} = \sum_{i = 1}^{T} \sum_{x \in I S} v_{i} (x) + \sum_{j = 1}^{n - T} \sum_{x \in I S} w_{j} (x) . \end{aligned}

High-level Description. The procedures of our MPSI-CA-sum protocol resemble those of the MPSI-CA protocol under arbitrary collusion (Protocol 5.1). Parties perform payload sharing, OPPRF and multi-party secret-shared shuffle on the associated payloads of each element. Additionally, they execute Protocol 5.1 to derive a binary vector $\vec{e}$ , which indicates the shuffled indices of elements belonging to $I S$ . For the shuffled elements that belong to $I S$ , L₁ invokes the OT functionality $F_{ot}$ with all other leaders, utilizing the choice string $\vec{e}$ to aggregate the sum of their associated payloads. This process allows L₁ to obtain the intersection-sum.

6.1. Payload sharing

In our MPSI-CA protocol, we employ the element sharing technique to distribute the PRF-encoded data sets of the clients to the leaders in the first step. This approach helps minimize the number of parties involved in costly interactive procedures. To preserve the association between elements and their corresponding payloads, it is essential for us to handle these payloads in a consistent manner. Consequently, we introduce a technique known as payload sharing, which is designed to share the payloads of clients with T leaders.

The purpose of payload sharing is that: for each element x in the data set $X_{i}$ , leader L_i holds a corresponding random value ${\hat{v}}_{i} (x)$ . We refer to ${\hat{v}}_{i} (x)$ as L_i’s payload sharing of x. If x belongs to the set intersection of all parties, denoted as $I S$ , then the sum of all leaders’ payload sharing of x is equal to the payload-sum of x, denoted as $S u m (x)$ . This can be expressed as $S u m (x) = \sum_{i = 1}^{T} v_{i} (x) + \sum_{j = 1}^{n - T} w_{j} (x)$ ). In other words, when $x \in I S$ , the following equality holds: $\sum_{i = 1}^{T} {\hat{v}}_{i} (x) = \sum_{i = 1}^{T} v_{i} (x) + \sum_{j = 1}^{n - T} w_{j} (x) = S u m (x)$ .

Protocol. Sub-protocol 9 outlines the process for payload sharing. The procedures of payload sharing resemble those of element sharing as described in Sub-protocol 4.1.

Sub-protocol 6.1 (Payload Sharing)

Parameters: The number of parties is n, number of leaders is T; set size is m.

Input: Set $X_{i} = {x_{i, 1}, \dots, x_{i, m}}$ and payload $V_{i} = {v_{i} (x_{i, 1}), \dots v_{i} (x_{i, m})}$ of leader L_i (for $i \in [T]$ ); Set $S_{j} = {s_{j, 1}, \dots, s_{j, m}}$ and payload $W_{j} = {w_{j} (s_{j, 1}), \dots w_{j} (s_{j, m})}$ of client P_j (for $j \in [n - T]$ ).

Protocol:

(Client) For $j \in [n - T]$ , client P_j acts as follows:

P_j chooses random PRF keys ${K_{j, i}^{'}}_{i \in [T]}$ and sends $K_{j, i}^{'}$ to leader L_i.

For each element $s_{j, k}$ ( $k \in [m]$ ) in set $S_{j}$ :

P_j computes the mask $\sum_{i = 1}^{T} PRF (K_{j, i}^{'}, s_{j, k})$ , and its masked payload of element $s_{j, k}$ is denoted as ${\hat{w}}_{j} (s_{j, k}) = w_{j} (s_{j, k}) + \sum_{i = 1}^{T} PRF (K_{j, i}^{'}, s_{j, k})$ .

P_j applies $(T, T)$ additive secret sharing to ${\hat{w}}_{j} (s_{j, k})$ , where the i-th share is denoted as ${\hat{w}}_{j}^{(i)} (s_{j, k})$ . In other words, $\sum_{i = 1}^{T} {\hat{w}}_{j}^{(i)} (s_{j, k}) = {\hat{w}}_{j} (s_{j, k})$ .

For every $i \in [T]$ , P_j encodes the i-th set of key-value pairs ${⟨ s_{j, k}, {\hat{w}}_{j}^{(i)} (s_{j, k}) ⟩}_{k \in [m]}$ into the i-th OKVS ${DW}_{j}^{(i)}$ , and then sends ${DW}_{j}^{(i)}$ to the i-th leader L_i.

(Leader) For $i \in [T]$ , Leader L_i decodes its element $x_{i, k}$ (for $k \in [m]$ ) on all its received OKVSs ${{DW}_{j}^{(i)}}_{j \in [n - T]}$ to obtain ${Decode ({DW}_{j}^{(i)}, x_{i, k})}_{j \in [n - T]}$ . L_i then computes the PRF outputs using all $n - T$ received PRF keys to obtain its payload sharing of $x_{i, k}$ as

\begin{aligned} {\hat{v}}_{i} (x_{i, k}) = v_{i} (x_{i, k}) - \sum_{j = 1}^{n - T} F (K_{j, i}^{'}, x_{i, k}) + \sum_{j = 1}^{n - T} Decode (D W_{j}^{(i)}, x_{i, k}) . \end{aligned}

First, each client P_j (for $j \in [n - T]$ ) selects T random PRF keys ${K_{j, i}^{'}}_{i \in [T]}$ , and sends $K_{j, i}^{'}$ to the corresponding leader L_i for $i \in [T]$ . After that, P_j encrypts its element $s_{j, k}$ (for $k \in [m]$ ) using these T PRF keys and calculates the sum of the ciphertexts, resulting in the random mask $\sum_{i \in [T]} F (K_{j, i}^{'}, s_{j, k})$ . For every $k \in [m]$ . P_j then blinds the payload $w_{j} (s_{j, k})$ of element $s_{j, k}$ with the corresponding random mask $\sum_{i \in [T]} F (K_{j, i}^{'}, s_{j, k})$ to obtain the blinded payload ${\hat{w}}_{j} (s_{j, k})$ . Subsequently, client P_j applies $(T, T)$ secret sharing to each blinded payload ${\hat{w}}_{j} (s_{j, k})$ , generating T shares ${{\hat{w}}_{j}^{(i)} (s_{j, k})}_{i \in [T]}$ . For $i \in [T]$ , client P_j then encodes the key-value pairs ${⟨ s_{j, k}, {\hat{w}}_{j}^{(i)} (s_{j, k}) ⟩}_{k \in [m]}$ into the i-th OKVS $D W_{j}^{(i)}$ , and sends the encoded OKVS to the corresponding leader L_i.

For $i \in [T]$ , leader L_i computes two values for its elements $x_{i, k}$ for every $k \in [m]$ : (1) the sum of $n - T$ decode outputs $\sum_{j = 1}^{n - T} Decode (D W_{j}^{(i)}, x_{i, k})$ ; and (2) the sum of $n - T$ ciphertexts $\sum_{j = 1}^{n - T} F (K_{j, i}^{'}, x_{i, k})$ . Finally, L_i combines these two values with its own payload $v_{i} (x_{i, k})$ to obtain its payload sharing of $x_{i, k}$ , which is denoted as ${\hat{v}}_{i} (x_{i, k})$ .

The correctness proof of Sub-protocol 9 is as follows. If $x \in I S$ , then each leader L_i holds a payload sharing ${\hat{v}}_{i} (x)$ of x. Meanwhile, the key-value pair $⟨ x, {\hat{w}}_{j}^{(i)} (x) ⟩$ is encoded into the i-th OKVS of client P_j, denoted as $D W_{j}^{(i)}$ . Therefore, we can rewrite L_i’s payload sharing of x as ${\hat{v}}_{i} (x) = \sum_{j = 1}^{n - T} {\hat{w}}_{j}^{(i)} (x) - \sum_{j = 1}^{n - T} F (K_{j, i}^{'}, x) + v_{i} (x)$ . During the computation of $\sum_{i = 1}^{T} {\hat{v}}_{i} (x)$ , it can be observed that each PRF key $K_{i, j}^{'}$ is used twice on the same item x, once by the client P_j and once by the leader L_i. As a result, the two PRF outputs cancel each other out. Besides, according to the property of $(T, T)$ secret sharing, the sum of all leaders’ payload sharings of the intersection element x is equal to the payload-sum of x (i.e., $S u m (x)$ ).

If $x \notin I S$ , it can be further divided into two cases: either x does not belong to the set $X_{i}$ of some leader $L_{i} (i \in [T])$ , or it does not belong to the set $S_{j}$ of some client $P_{j} (j \in [n - T])$ . In the first case, it directly implies that leader L_i does not hold any payload sharing of x. In the second case, it means that x is not encoded in the OKVS ${DW}_{j}^{(i)}$ . In this situation, when L₁ decodes x on ${DW}_{j}^{(i)}$ , L₁ will obtain an independent random value. As a result, the probability that $\sum_{i = 1}^{T} {\hat{v}}_{i} (x)$ coincidentally equals $S u m (x)$ when $x \notin I S$ is negligible.

6.2. Detailed description

The MPSI-CA-sum protocol under arbitrary collusion is presented in Protocol 6.2. In step 4, we reuse the hash table ${T a b l e}_{i}, i \in [T]$ which was generated in step 3 using the bucketing technique, as outlined in step 2(a) of Protocol 5.1.

The analysis of $F_{opprf}^{F, 3 m, b}$ and $F_{mSS}^{T, b}$ in the MPSI-CA-sum protocol is analogous to our previous analysis in Protocol 5.1, with the key distinction that the leaders now operate on payloads instead of individual elements. According to the properties of the underlying primitives, if an element ${T a b l e}_{1} [k]$ belongs to the intersection $I S$ , then the outputs of $F_{mSS}^{T, b}$ will satisfy the condition $\sum_{i \in [T]} g_{i, π (k)}^{'} = S u m ({T a b l e}_{1} [k])$ , where $S u m ({T a b l e}_{1} [k])$ represents the payload-sum of the intersection element ${T a b l e}_{1} [k]$ . Conversely, if ${T a b l e}_{1} [k] \notin I S$ , $\sum_{i \in [T]} g_{i, π (k)}^{'}$ will be a random value.

In order to derive the payload-sum of all the intersection elements, leader L₁ needs to determine the shuffled indices of those intersection elements. Recall that in step 3, leader L₁ outputs a binary vector $\vec{e} = (e_{1}, \dots, e_{b})$ . If $e_{k} = 1$ , it means that the element in the $π^{- 1} (k)$ -th bin of ${T a b l e}_{1}$ belongs to the intersection $I S$ . Although L₁ cannot infer the original index of this element (i.e., $π^{- 1} (k)$ ) from k, L₁ is aware of the existence of such an element. Therefore, he can still aggregate the associated payloads of this element by invoking b OTs with each secondary leader L_i for $i \in [2, T]$ . In the k-th OT with L_i, L₁ acts as the receiver with a choice bit $e_{k}$ , while L_i acts as the sender with two strings $(m k_{i, k}, m k_{i, k} + g_{i, k}^{'})$ . The random masks ${m k_{i, k}}_{i \in [2, T], k \in [b]}$ are chosen such that $\sum_{i = 2}^{T} \sum_{k = 1}^{b} m k_{i, k} = 0$ .

Protocol 6.2 (MPSI-CA-sum Under Arbitrary Collusion)

Parameters: The set size is m; the number of leaders is $T = t + 1$ ; hash functions $h_{1}$ , $h_{2}$ , $h_{3}$ ; the number of bins is b.

Protocol:

(Element Sharing) Parties perform element sharing by following Sub-protocol 4.1. For each $k \in [m]$ , leader L_i obtains its element sharing of $x_{i, k}$ as $q_{i} (x_{i, k})$ .

(Payload Sharing) Parties perform payload sharing by following Sub-protocol 9. For each $k \in [m]$ , leader L_i obtains its payload sharing of $x_{i, k}$ as ${\hat{v}}_{i} (x_{i, k})$ .

(T-party PSI-CA) Leaders $L_{1}, \dots, L_{T}$ execute step 2 of Protocol 5.1. As a result, L₁ acquires a binary vector $\vec{e}$ , where the number of 1s in $\vec{e}$ represents the intersection cardinality $| I S |$ .

( T -party MPSI-CA-sum)

(In step 3, each leader L_i (for $i \in [T]$ ) has already obtained their respective hash table ${T a b l e}_{i}$ , so there is no need to perform the bucketing step again.)

(OPPRF)L₁ invokes $F_{opprf}^{F, 3 m, b}$ with every secondary leader L_i (for $i \in [2, T]$ ):

Sender L_i provides a programmed set $P = {P_{k}}_{k \in [b]}$ , where subset $P_{k} = {⟨ x, {\hat{v}}_{i} (x) - g_{i, k} ⟩}_{x \in {T a b l e}_{i} [k]}$ stores key-value pairs for the k-th bin ${T a b l e}_{i} [k]$ . Here, $g_{i, k}$ is a random value chosen for each $k \in [b]$ .

Receiver L₁ provides b queries ${{T a b l e}_{1} [k]}_{k \in [b]}$ , and outputs $\vec{p_{i}} = (p_{i, 1}, \dots, p_{i, b})$ . Here, $p_{i, k}$ is the OPPRF output on element ${T a b l e}_{1} [k]$ (for $k \in [b]$ ).

For each bin $k \in [b]$ , L₁ computes $g_{1, k} = {\hat{v}}_{1} ({T a b l e}_{1} [k]) + \sum_{i = 2}^{T} p_{i, k}$ . The results are denoted as a vector $\vec{g_{1}} = (g_{1, 1}, \dots, g_{1, b})$ .

(T-party Shuffle) All leaders L_i (for $i \in [T]$ ) jointly invoke $F_{mSS}^{T, b}$ :

For each $i \in [T]$ , leader L_i inputs the permutation $π_{i}$ (used in step 3) and the vector ${\vec{g}}_{i} = (g_{i, 1}, \dots, g_{i, b})$ . Consequently, L_i outputs an additive share $\vec{g_{i}^{'}}$ of the shuffled sum $π (\sum_{i = 1}^{T} \vec{g_{i}})$ , where $\sum_{i = 1}^{T} \vec{g_{i}^{'}} = π (\sum_{i = 1}^{T} \vec{g_{i}})$ and $π = π_{T} \circ \dots \circ π_{1}$ .

(Intersection-sum Computation)

L₁ locally computes $\sum_{e_{k} = 1, k \in [b]} g_{1, k}^{'}$ .

$L_{2}, \dots, L_{T}$ jointly generate $T - 1$ random mask vectors ${\vec{m k}}_{i} = (m k_{i, 1}, \dots, m k_{i, b})$ , $i \in [2, T]$ , ensuring that $\sum_{i = 2}^{T} \sum_{k = 1}^{b} m k_{i, k} = 0$ .

L₁ invokes $F_{ot}$ with each secondary leader L_i (for $i \in [2, T]$ ):

Sender L_i inputs a set of strings ${(m k_{i, k}, m k_{i, k} + g_{i, k}^{'})}_{k \in [b]}$ .

Receiver L₁ inputs the choice string $\vec{e}$ and obtains b outputs ${z_{i, k}}_{k \in [b]}$ . If $e_{k} = 0$ , then $z_{i, k} = m k_{i, k}$ ; otherwise, $z_{i, k} = m k_{i, k} + g_{i, k}^{'}$ .

L₁ computes the intersection-sum ${S u m}_{I S}$ by adding $\sum_{e_{k} = 1, k \in [b]} g_{1, k}^{'}$ to the sum of all $b (T - 1)$ OT outputs received in step 5.

The random masks ${m k_{i, k}}_{i \in [2, T], k \in [b]}$ are used to ensure that L₁ cannnot learn the payload-sum of specific elements. They can be generated through additive secret sharing among the secondary leaders.

First, each secondary leader L_i (for $i \in [2, T]$ ) generates a random vector $\vec{m k_{i}^{'}} = (m k_{i, 1}^{'}, \dots, m k_{i, b}^{'})$ that satisfies $\sum_{k = 1}^{b} m k_{i, k}^{'} = 0$ .

Then, each secondary leader L_i (for $i \in [2, T]$ ) applies $(T - 1, T - 1)$ additive secret sharing to the vector $\vec{m k_{i}^{'}}$ to obtain $T - 1$ shares ${\vec{M k_{i}^{(j)}}}_{j \in [2, T]}$ . After that, L_i sends its share ${\vec{M k_{i}}}^{(j)}$ to the corresponding leader $L_{j}$ .

Finally, each secondary leader L_i (for $i \in [2, T]$ ) combines all the received shares ${\vec{{M k}_{j}^{(i)}}}_{j \in [2, T] ∖ {i}}$ and its own local share $\vec{M k_{i}^{(i)}}$ , which results in the random mask $\vec{m k_{i}}$ .

Correctness. Since the correctness of

| I S |

(obtained in step 3) has already been proven in Section 5.1, here we only prove the correctness of

{S u m}_{I S}

. Assuming that

e_{k} = 1

, this indicates that the element

{T a b l e}_{1} [π^{- 1} (k)]

belongs to the set

I S

. Therefore, according to the property of payload sharing, it holds that

\sum_{i = 1}^{T} g_{i, k}^{'} = S u m ({T a b l e}_{1} [π^{- 1} (k)])

. After invoking OTs with each secondary leader, L₁ combines the

b (T - 1)

OT outputs and its local shares

\sum_{e_{k} = 1, k \in [b]} g_{1, k}^{'}

together. As shown in the equation (4), the sum yields the correct result

{S u m}_{I S}

\begin{aligned} \underset{OT outputs}{\underset{⏟}{\sum_{i = 2}^{T} \sum_{k = 1}^{b} m k_{i, k} + \sum_{i = 2}^{T} \sum_{e_{k} = 1, k \in [b]} g_{i, k}^{'}}} + \underset{Local shares}{\underset{⏟}{\sum_{e_{k} = 1, k \in [b]} g_{1, k}^{'}}} \\ = \underset{{Others}^{'} shares}{\underset{⏟}{\sum_{i = 2}^{T} \sum_{e_{k} = 1, k \in [b]} g_{i, k}^{'}}} + \underset{Local shares}{\underset{⏟}{\sum_{e_{k} = 1, k \in [b]} g_{1, k}^{'}}} \\ = 0 + \sum_{x \in I S} S u m (x) \\ = {S u m}_{I S} \end{aligned}

(4)

Security. Regarding the security of Protocol 6.2, we have the following theorem. Theorem 5.

Protocol 6.2 securely computes $F_{MPSI - CA - sum}$ in the presence of a semi-honest adversary which may corrupt up to t parties $(t < n)$ , if $F_{opprf}^{F, 3 m, b}$ , $F_{mSS}^{T, b}$ and $F_{OZK}^{T, b}$ and $F_{ot}$ are secure against semi-honest adversaries.

Proof. Proof.

Since the simulation strategies for step 1 and step 3 are provided in Theorem 4 of Protocol 5.1, we will only offer a concise overview of the security proof for the remaining steps, which involve the computation of the intersection-sum. We divide the security proof into three cases.

Case 1: ( $L_{i} \notin C$ holds for every $i \in [T]$ ) In this trivial case, the corrupted coalition $C$ consists only of at most t corrupted parties, and their views can be easily simulated since they receive no messages from others.

Case 2: ( $L_{1} \notin C$ ) In this case, $C$ may comprise certain clients and secondary leaders. The simulation strategies for the corrupted clients are consistent with Case 1. For the corrupted secondary leader L_i, Sim simulates its view as follows:

In step 2, Sim randomly samples $K_{j, i}^{″}$ to simulate the PRF key $K_{j, i}^{'}$ received by L_i during the payload sharing stage. Then, Sim uses an OKVS $(D W_{j}^{(i)})^{'}$ that encodes m random key-value pairs to simulate the OKVS $D W_{j}^{(i)}$ sent by the honest party P_j. The obliviousness property of OKVS and the pseudorandomness of PRF ensure that the two OKVSs are computationally indistinguishable.

In step 4(b), Sim randomly selects keys $(k_{1}^{'}, \dots, k_{b}^{'})$ to simulate the output PRF keys of L_i in $F_{opprf}^{F, 3 m, b}$ . Sim creates the programmed set $P$ and hint $h i n t$ by following step 4(b). Then, Sim invokes the OPPRF simulator and appends the output $S i m_{opprf} (P, {(k_{1}^{'}, \dots, k_{b}^{'}), h i n t})$ to the view. The security of protocol $Π_{opprf}^{F, 3 m, b}$ guarantees that the simulated view is computationally indistinguishable from the real view in the real execution.

In step 4(d), Sim randomly selects $\vec{g_{i}^{″}} = (g_{i, 1}^{″}, \dots, g_{i, b}^{″})$ to simulate the output of L_i in $F_{mSS}^{T, b}$ . Then, Sim invokes the simulator of $F_{mSS}^{T, b}$ and appends the output $S i m_{mSS} ({\vec{g_{i}}, π_{i}}, \vec{g_{i}^{″}})$ to the view. The security of protocol $Π_{mSS}^{T, b}$ ensures the computational indistinguishability of the simulated view and the real view.

In step 5(b), Sim uses random vectors to simulate the received shares of L_i, and creates the mask $\vec{m k_{i}}$ . The simulated vectors have the same distribution with the vectors in the real world. Subsequently, Sim invokes the simulator of $F_{ot}$ and appends the output $S i m_{ot} ({(m k_{i, k}, m k_{i, k} + g_{i, k}^{″})}_{k \in [b]})$ to the view. The security of protocol $Π_{ot}$ ensures the computational indistinguishability of the simulated view and the real view.

Therefore, the simulated view of

C

is computationally indistinguishable from the real view.

Case 3: ( $L_{1} \in C$ ) The simulation strategies for the corrupted clients and secondary leaders are consistent with Case 1 and Case 2. For the corrupted primary leader L₁, Sim simulates its view by employing the following strategies:

In step 2, Sim randomly samples $K_{j, 1}^{″}$ to simulate the PRF key $K_{j, 1}^{'}$ received by L₁ during the payload sharing stage. Subsequently, Sim uses an OKVS $(D W_{j}^{(1)})^{'}$ that encodes m random key-value pairs to simulate the OKVS $D W_{j}^{(1)}$ sent by the honest party P_j. The obliviousness property of OKVS and the pseudorandomness of PRF ensure that the two OKVSs are computationally indistinguishable.

In step 4(b), Sim randomly selects $\vec{p^{'}} = (p_{1, 1}^{'}, \dots, p_{1, b}^{'})$ to simulate the output of L₁ in $F_{opprf}^{F, 3 m, b}$ . Then, Sim invokes the simulator of $F_{opprf}^{F, 3 m, b}$ and appends the output $S i m_{opprf} ({{T a b l e}_{1} [k]}_{k \in [b]}, {p_{1, k}^{'}}_{k \in [b]})$ to the view. The security of protocol $Π_{opprf}^{F, 3 m, b}$ guarantees that the simulated view is computationally indistinguishable from the real view in the real execution.

In step 4(d), Sim randomly selects $\vec{g_{1}^{″}} = (g_{1, 1}^{″}, \dots, g_{1, b}^{″})$ to simulate the output of L₁ in $F_{mSS}^{T, b}$ . Subsequently, Sim invokes the simulator of $F_{mSS}^{T, b}$ and appends the output $S i m_{mSS} ({\vec{g_{1}}, π_{1}}, \vec{g_{1}^{″}})$ to the view. The security of protocol $Π_{mSS}^{T, b}$ ensures the computational indistinguishability of the simulated view and the real view.

In step 5(b), Sim first extracts the binary vector $π^{'} (\vec{e^{'}})$ output by the simulator $S i m_{MPSI - CA}$ of Protocol 5.1 (by following the simulation strategies outlined in Theorem 4). As proven in Theorem 4, $π^{'} (\vec{e^{'}})$ is indistinguishable from the choice string $\vec{e}$ in the real world. Subsequently, Sim selects random values ${z_{i, k}^{'}}_{k \in [b]}$ to simulate the OT outputs ${z_{i, k}}_{k \in [b]}$ from the secondary leader L_i, while ensuring that the sum of all $b (T - 1)$ outputs of $F_{OT}$ equals ${S u m}_{I S} - \sum_{e_{k} = 1, k \in [b]} g_{1, k}^{'}$ . From the pespective of the corrupted parties, these simulated outputs have the same distribution as the real outputs and also satisfy the correctness constraint. Afterwards, Sim invokes the simulator of $F_{OT}$ and appends the output $S i m_{OT} (π^{'} (\vec{e^{'}}), {z_{i, k}^{'}}_{k \in [b]})$ to the view. The security of protocol $Π_{OT}$ ensures the computational indistinguishability of these two views.

Therefore, the simulated view of

C

is computationally indistinguishable from the real view.

7. Experimental evaluation

In this section, we evaluate the performance of our protocols. We instantiate the underlying OPPRF and T-party Permute + Share primitives using the realization proposed in [26] and [29], respectively. Since the operations of computing the intersection-sum resemble those of computing the intersection cardinality, the computational complexity of our MPSI-CA-sum protocol is roughly double that of our MPSI-CA protocol under arbitrary collusion. Therefore, in this section, we only focus on evaluating the performance of our MPSI-CA protocols.

Parameters and Settings. We set the statistical security parameter to $λ = 40$ and the computational security parameter to $κ = 128$ . Our experiments are conducted on a laptop with an Intel i7-12700H 2.30 GHz CPU, 28 GB RAM, and Ubuntu-20.04 system in the LAN setting. Each party adopts separated threads to communicate with others to ensure parallelism.

7.1. Performance of MPSI-CA under arbitrary collusion

Division of Two Phases. In our MPSI-CA protocol under arbitrary collusion, we employ the secret sharing mechanism to achieve the purpose of oblivious zero-sum check, and it takes every two parties about 32 seconds to generate $2^{18}$ Beaver triples [8] in the setup stage. Moreover, we divide our protocol into offline and online phases in the experiment. The offline phase consists of all the base OT operations in the multi-party secret-shared shuffle, which can be carried out in advance because they are independent of the input sets. The online phase covers the remaining operations, including element sharing, OPPRF, secret-shared shuffle (excluding base OT) and oblivious zero-sum check (excluding Beaver triples generation).

Table 2
The running time and communication cost in our MPSI-CA protocol under arbitrary collusion (Protocol 5.1).

Total running time (seconds) Total communication cost (MB)

n t Roles $m = 2^{12}$ $m = 2^{14}$ $m = 2^{16}$ $m = 2^{18}$ $m = 2^{12}$ $m = 2^{14}$ $m = 2^{16}$ $m = 2^{18}$

5 1 Client 0.109 0.204 0.272 5.919 0.156265 0.625015 2.50002 10

Leader 0.718 1.657 5.015 26.620 5.64464 25.4972 113.907 503.547

Online 0.467 1.354 4.433 24.110

2 Client 0.110 0.221 2.77 5.921 0.156281 0.625031 2.50003 10

Leader 1.022 2.339 6.867 36.352 10.6642 48.4943 217.814 967.094

Online 0.543 1.542 5.067 26.894

4 Leader 2.650 4.025 12.165 52.312 20.7034 94.4886 425.628 1894.19

Online 0.670 1.789 6.289 31.236

10 1 Client 0.122 0.224 0.293 5.96 0.156265 0.625015 2.50002 10

Leader 0.723 1.668 5.126 26.871 6.42595 28.6222 126.407 553.547

Online 0.497 1.360 4.567 24.516

5 Client 0.111 2.37 0.309 5.994 0.156326 0.625076 2.50008 10.0001

Leader 3.302 5.284 15.264 72.882 26.5044 120.611 542.036 2407.74

Online 0.844 1.890 5.733 32.764

9 Leader 7.531 10.796 30.254 150.109 46.5828 212.599 957.664 4261.92

Online 1.004 2.690 10.415 58.485

15 1 Client 0.128 0.245 0.310 5.978 0.156265 0.625015 2.50002 10

Leader 0.726 1.676 5.177 27.174 7.20725 31.7473 138.907 603.547

Online 0.505 1.383 4.598 25.115

7 Client 0.111 0.239 0.349 6.183 0.156357 0.625107 2.50011 10.0001

Leader 4.576 7.766 21.536 107.927 37.3249 169.73 762.35 3384.83

Online 0.926 2.257 5.955 36.913

14 Leader 13.955 22.676 63.061 365.722 72.4621 313.97 1489.7 6629.66

Online 1.477 4.503 12.805 95.228

			Total running time (seconds)	Total communication cost (MB)
5	1	Client	0.109	0.204	0.272	5.919	0.156265	0.625015	2.50002	10
Leader	0.718	1.657	5.015	26.620	5.64464	25.4972	113.907	503.547
Online	0.467	1.354	4.433	24.110
2	Client	0.110	0.221	2.77	5.921	0.156281	0.625031	2.50003	10
Leader	1.022	2.339	6.867	36.352	10.6642	48.4943	217.814	967.094
Online	0.543	1.542	5.067	26.894
4	Leader	2.650	4.025	12.165	52.312	20.7034	94.4886	425.628	1894.19
Online	0.670	1.789	6.289	31.236
10	1	Client	0.122	0.224	0.293	5.96	0.156265	0.625015	2.50002	10
Leader	0.723	1.668	5.126	26.871	6.42595	28.6222	126.407	553.547
Online	0.497	1.360	4.567	24.516
5	Client	0.111	2.37	0.309	5.994	0.156326	0.625076	2.50008	10.0001
Leader	3.302	5.284	15.264	72.882	26.5044	120.611	542.036	2407.74
Online	0.844	1.890	5.733	32.764
9	Leader	7.531	10.796	30.254	150.109	46.5828	212.599	957.664	4261.92
Online	1.004	2.690	10.415	58.485
15	1	Client	0.128	0.245	0.310	5.978	0.156265	0.625015	2.50002	10
Leader	0.726	1.676	5.177	27.174	7.20725	31.7473	138.907	603.547
Online	0.505	1.383	4.598	25.115
7	Client	0.111	0.239	0.349	6.183	0.156357	0.625107	2.50011	10.0001
Leader	4.576	7.766	21.536	107.927	37.3249	169.73	762.35	3384.83
Online	0.926	2.257	5.955	36.913
14	Leader	13.955	22.676	63.061	365.722	72.4621	313.97	1489.7	6629.66
Online	1.477	4.503	12.805	95.228

Running Time and Communication Cost. Table 2 shows the running time of our MPSI-CA protocol under arbitrary collusion for both the online and offline phases, as well as its communication cost, which includes sent and received messages. We evaluate the performance of the clients and the primary leader under three different corruption conditions: $t = 1$ , $n / 2$ and $n - 1$ , where t is the corruption threshold. In the case of $t = 1$ , our MPSI-CA protocol completes the task of intersection cardinality computation of 15 parties, each with a large set size of $2^{18}$ , in only 27.174 seconds. In the honest majority situation ( $t = n / 2$ ), the running time of the leaders increases with the number of parties participating in multi-party secret-shared shuffle, resulting in a linear relationship with t. When $n = 15$ and $m = 2^{12}$ , the total running time is about 4.576 seconds, while the online phase takes only 0.926 seconds. In the most challenging dishonest majority setting where $t = n - 1$ , parties are not allowed to share their sets to leaders due to the fear of collusion attack. Therefore, the number of leaders has to be n. However, since most of the expensive operations of multi-party secret-shared shuffle can be shifted to the offline phase, the total online running time can be reduced to only one fourth of the original time.

With respect to the communication performance of different parties, the cost of client is nearly independent of n and t. Whereas the cost of primary leader not only depends on n, but is also linear in the number of leaders $T = t + 1$ . Concretely, when the set size is large (i.e., $m = 2^{18}$ ), our protocol takes roughly 7 KB communication cost per item at each leader’s side when $n = 5$ , $t = 4$ , which includes both sent and received messages. This cost increases to about 25 KB per item in the most challeging case that $n = 15$ , $t = 14$ and $m = 2^{18}$ .

Table 3

The running time of different steps in our MPSI-CA protocol under arbitrary collusion (Protocol 5.1).

	$n = 5$ , $t = 3$			$n = 10$ , $t = 8$			$n = 15$ , $t = 13$
Steps	$2^{12}$	$2^{14}$	$2^{16}$	$2^{12}$	$2^{14}$	$2^{16}$	$2^{12}$	$2^{14}$	$2^{16}$
	Running time of different steps (seconds)
	Element sharing	0.119	0.250	0.319	0.109	0.237	0.341	0.103	0.263	0.337
OPPRF	0.473	1.261	5.113	0.566	1.407	4.697	0.942	1.821	5.691
Shuffle (offline)	1.296	1.432	3.723	4.689	7.467	19.278	10.964	17.459	47.680
Shuffle (online)	0.024	0.069	0.219	0.041	0.177	0.704	0.065	0.279	1.189
Oblivious zero-sum check	0.006	0.009	0.017	0.013	0.021	0.036	0.019	0.034	0.065
Total	1.938	3.062	9.560	5.485	9.425	25.368	12.269	20.115	55.561
Online	0.642	1.630	5.837	0.796	1.958	6.090	1.305	2.656	7.881

Running Time of Different Steps. Table 3 lists the running time of different steps in Protocol 5.1 when $t = n - 2$ . As shown in the table, the running time of the two steps OPPRF and shuffle constitutes a significant proportion of the total running time. As n increases, the change in the running time of OPPRF is slight due to the use of separated threads by each party to ensure parallelism. Moreover, it is important to note that the phase of shuffling comprises two stages: online and offline. Most of the expensive operations, such as determining the OT-based encoding pattern, can be shifted to the offline stage. This allows us to focus on evaluating the benes switching network output on our masked input of size m during the online execution of our protocol. In a scenario where $m = 2^{16}$ , $n = 15$ and $t = 13$ , our protocol takes only 7.881 seconds to complete the online task of MPSI-CA computation with this optimization.

7.2. Performance of MPSI-CA with two non-colluding parties

Table 4 presents the total running time and communication cost of our MPSI-CA protocol with two non-colluding parties (Protocol 4.2). Its practicality can be verified by our numeric results. As shown in the table, it requires only 0.620 seconds for $n = 15$ parties with $m = 2^{12}$ elements to compute the intersection cardinality. Even when the data size m increases to $2^{18}$ , the running time is only approximately 25.8 seconds.

Our Protocol 4.2 not only establishes a foundation for designing our MPSI-CA protocol under arbitrary collusion (Protocol 5.1), but also demonstrates a more lightweight approach to computing the intersection cardinality. This protocol provides a weaker security guarantee in exchange for faster computation and lower communication cost when compared with Protocol 5.1. For instance, when $n = 15$ , $m = 2^{18}$ and $t = 7$ , the running time of Protocol 4.2 is only a quarter of that of Protocol 5.1. To sum up, its advantages mainly come from two aspects:

(Reduced number of leaders) Protocol 4.2 only requires two leaders, significantly reducing the number of parties involved in the expensive interactive procedures compared to Protocol 5.1, which requires $t + 1$ leaders. This reduction not only enhances the efficiency of the protocol but also makes the protocol particularly suitable for applications with a large number of participants n and a high corruption threshold t. For instance, in the scenario where $m = 2^{12}$ and $n = 15$ , the total running time of the protocol is 0.62 seconds, which is only about 0.033 seconds longer than that of $n = 3$ .

(Simplified shuffling operations) Protocol 5.1 requires the leaders to use a multi-party secret-shared shuffle protocol to ensure the privacy of the permutation π. It involves $t + 1$ rounds of a $(t + 1)$ -party Permute + Share protocol, which can be computationally expensive. However, in Protocol 4.2, since L₁ and L₂ are non-colluding, we can simplify the protocol by asking L₂ to select the permutation π and invoke a two-party Permute + Share protocol with L₁. The security of the two-party Permute + Share protocol is sufficient to guarentee that L₁ cannot learn π through collusion.

Hence, for application scenarios where performance is prioritized, we believe that Protocol 4.2 will stand out as an excellent choice.

7.3. Comparison with other works

To the best of our knowledge, there are only two MPSI-CA schemes [7,24] that are secure against arbitrary collusion in the semi-honest adversary model. However, these schemes only provide theoretical analysis of their performance without any experimental results. Table 5 compares the performance of them and our MPSI-CA protocol under arbitrary collusion (Protocol 5.1) in terms of computational and communication complexities.

Note that t represents the corruption threshold, and it is limited to be no larger than $n - 1$ . It is worth mentioning that in the dishonest majority setting (also known as truly arbitrary collusion) considered by [7,24], t is equal to $n - 1$ and all parties have to act as leaders. However, in Table 5, we still use the notation t to differentiate the overhead of clients from leaders. This allows us to gain a clearer understanding of the computational and communication complexities of different parties under various corruption conditions.

Table 4
The running time and communication cost of our MPSI-CA protocol with two non-colluding parties (Protocol 4.2).

n Roles $m = 2^{12}$ $m = 2^{14}$ $m = 2^{16}$ $m = 2^{18}$

Total running time (seconds)

3 Client 0.108 0.219 0.272 5.897

Leader 0.587 1.473 4.563 24.771

5 Client 0.108 0.219 0.272 5.897

Leader 0.603 1.502 4.739 24.826

10 Client 0.108 0.219 0.272 5.897

Leader 0.611 1.509 4.860 25.174

15 Client 0.108 0.219 0.272 5.897

Leader 0.620 1.532 4.995 25.856

n	Roles	$m = 2^{12}$	$m = 2^{14}$	$m = 2^{16}$	$m = 2^{18}$
Total running time (seconds)
3	Client	0.108	0.219	0.272	5.897
Leader	0.587	1.473	4.563	24.771
5	Client	0.108	0.219	0.272	5.897
Leader	0.603	1.502	4.739	24.826
10	Client	0.108	0.219	0.272	5.897
Leader	0.611	1.509	4.860	25.174
15	Client	0.108	0.219	0.272	5.897
Leader	0.620	1.532	4.995	25.856

Roles	n	$m = 2^{12}$	$m = 2^{14}$	$m = 2^{16}$	$m = 2^{18}$
Total communication cost (MB)
Client	3,5,10,15	0.156265	0.625015	2.50002	10
Secondary leader	3	4.62229	20.7406	88.8931	392.623
	5	4.62235	20.7407	88.8931	392.623
	10	4.62247	20.7408	88.8932	392.623
	15	4.98465	21.3684	88.8933	392.623
Primary leader	3	3.10952	13.8683	61.4033	269.543
	5	3.42204	15.1183	66.4033	289.543
	10	4.20335	18.2434	78.9034	339.543
	15	4.6226	20.7409	91.4034	389.543

Table 5

The computational and communication complexities of MPSI-CA schemes, where k is the ratio of OKVS size to its encoded set size, m is the set size, $m_{max}$ is the largest set size (Note that in the balanced data setting, $m = m_{max}$ ).

MPSI-CA scheme	Primary leader	Secondary leader	Client	Total
Computational complexity (number of public key operations)
[24]	/	/	$O (n m_{max}^{2})$	$O (n^{2} m_{max}^{2})$
[7]	$O (m)$	/	$O (k m_{max})$	$O (k n m_{max})$
Our Protocol 4.2	$O (κ)$	$O (κ)$	/	$O (κ)$
Our Protocol 5.1	$O (t κ)$	$O (t κ)$	/	$O (t^{2} κ)$
Computational complexity (number of symmetric key operations)
Our Protocol 4.2	$O (m \log (m))$	$O ((n - t) m + m \log (m))$	$O (t m)$	$O ((n - t) t m + m \log (m))$
Our Protocol 5.1	$O (t m \log (m))$	$O ((n - t) m + t m \log (m))$	$O (t m)$	$O ((n - t) t m + t^{2} m \log (m))$
Communication complexity (bits)
[24]	/	/	$O (n m_{max})$	$O (n^{2} m_{max})$
[7]	$O (m)$	/	$O (k m_{max})$	$O (k n m_{max})$
Our Protocol 4.2	$O (m \log (m))$	$O (k m + m \log (m))$	$O (k m)$	$O (k (n - 1) m + m \log (m))$
Our Protocol 5.1	$O (t m \log (m))$	$O (k m + t m \log (m))$	$O (k m)$	$O (k (n - 1) m + t^{2} m \log (m))$

As shown in Table 5, [24] and [7] both rely on a large number of expensive public key operations, which is linear in the maximal set size $m_{max}$ or even $m_{max}^{2}$ . Therefore, it is impractical for resource-limited devices with large data sets to carry out these protocols due to the massive computational overhead. Moreover, the efficiency of those schemes remains to be improved in the unbalanced data setting (i.e., the minimal set size $m_{min} ≪ m_{max}$ ), or when the number of corrupted parties t only accounts for a small percentage of n.

By adopting lightweight primitives that do not require any public key operations besides a set of base OTs, the number of public key operations in our MPSI-CA protocols is independent of the set size. In scenarios where the set size is large and the number of parties is small, this leads to a significant performance improvement when compared with [7,24]. At the same time, clients only need to send their PRF-encoded data to the leaders instead of participating in the expensive cryptographic interactive protocols for themselves, so that the total computational complexity can be reduced especially when $t / n$ is small. Besides, all the OTs required in the multi-party secret-shared shuffle can be carried out in an offline phase, thus further decreasing the computational complexity of the online execution of our MPSI-CA protocol.

In terms of communication complexities, although utilizing expensive HE can reduce the communication cost during the multi-party shuffle stage in [7], we reckon that the gap between [7] and our MPSI-CA scheme can be narrowed in an unbalanced setting by assigning the party with the smallest data set ( $m_{1}$ ) as leader L₁ to ensure that $m_{1} ≪ m < m_{max}$ . In this scenario, the extra communication overhead introduced by the multi-party secret-shared shuffle and oblivious zero-sum check can be minimized. This adjustment ensures that the communication efficiency of our MPSI-CA scheme is on par with that of [7].

8. Summary

In this paper, we first proposed an efficient MPSI-CA protocol with two non-colluding parties. It fully takes advantage of the star like network topology, element sharing technique and OT-extension acceleration to reduce overall computational and communication overhead. Building on this foundation, this paper proceeded to introduce the first MPSI-CA protocol that achieves simultaneous practicality and security under arbitrary collusion, which can resist the collusion of any subset of participants. To achieve this goal, we developed a multi-party secret-shared shuffle primitive to facilitate collaborative shuffling of the sum of participants’ inputs in an unknown permutation. We then demonstrated how to integrate this new primitive with our intuitive MPSI-CA protocol involving two non-colluding parties to propose an enhanced MPSI-CA protocol under arbitrary collusion. This protocol primarily relies on lightweight cryptographic primitives and operates more efficiently than previous homomorphic encryption based benchmark schemes when handling large data sets. Additionally, we defined the problem of MPSI-CA-sum and extended the aforementioned enhanced MPSI-CA protocol to address this scenario. Numeric results and theoretical complexities highlight the performance advantages of our protocols.

Footnotes

Acknowledgments

This work was supported in part by the National Key Research and Development Project 2020YFA0712300.

References

Ashok

V.G.

Mukkamala

, A scalable and efficient privacy preserving global itemset support approximation using bloom filters, in: IFIP Annual Conference on Data and Applications Security and Privacy, Springer, 2014, pp. 382–389.

Bay

Erkin

Hoepman

J.H.

Samardjiska

Vos

, Practical multi-party private set intersection protocols, IEEE Transactions on Information Forensics and Security 17 (2021), 1–15. doi:10.1109/TIFS.2021.3118879.

Chandran

Dasgupta

Gupta

Obbattu

S.L.B.

Sekar

Shah

, Efficient linear multiparty psi and extensions to circuit/quorum psi, in: Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, 2021, pp. 1182–1204. doi:10.1145/3460120.3484591.

Chandran

Gupta

Shah

, Circuit-psi with linear complexity via relaxed batch opprf, Cryptology ePrint Archive, 2021.

Chase

Ghosh

Poburinnaya

, Secret-shared shuffle, in: International Conference on the Theory and Application of Cryptology and Information Security, Springer, 2020, pp. 342–372.

Debnath

S.K.

Dutta

, Provably secure fair mutual private set intersection cardinality utilizing bloom filter, in: International Conference on Information Security and Cryptology, Springer, 2016, pp. 505–525.

Debnath

S.K.

Stǎnicǎ

Kundu

Choudhury

, Secure and efficient multiparty private set intersection cardinality, Advances in Mathematics of Communications 15(2) (2021), 365. doi:10.3934/amc.2020071.

Demmler

Schneider

Zohner

, Aby-a framework for efficient mixed-protocol secure two-party computation, in: NDSS, 2015.

Dong

Chen

Wen

, When private set intersection meets big data: An efficient and scalable, protocol.

10.

Duong

Phan

D.H.

Trieu

, Catalic: Delegated psi cardinality with applications to contact tracing, in: International Conference on the Theory and Application of Cryptology and Information Security, Springer, 2020, pp. 870–899.

11.

Egert

Fischlin

Gens

Jacob

Senker

Tillmanns

, Privately computing set-union and set-intersection cardinality via bloom filters, in: Australasian Conference on Information Security and Privacy, Springer, 2015, pp. 413–430. doi:10.1007/978-3-319-19962-7_24.

12.

Evans

Kolesnikov

Rosulek

et al., A pragmatic introduction to secure multi-party computation, Foundations and Trends^® in Privacy and Security 2(2–3) (2018), 70–246. doi:10.1561/3300000019.

13.

Freedman

M.J.

Hazay

Nissim

Pinkas

, Efficient set intersection with simulation-based security, Journal of Cryptology 29(1) (2016), 115–155. doi:10.1007/s00145-014-9190-0.

14.

Freedman

M.J.

Nissim

Pinkas

, Efficient private matching and set intersection, in: International Conference on the Theory and Applications of Cryptographic Techniques, Springer, 2004, pp. 1–19.

15.

Garimella

Mohassel

Rosulek

Sadeghian

Singh

, Private set operations from oblivious switching, in: IACR International Conference on Public-Key Cryptography, Springer, 2021, pp. 591–617.

16.

Garimella

Pinkas

Rosulek

Trieu

Yanai

, Oblivious key-value stores and amplification for private set intersection, in: Annual International Cryptology Conference, Springer, 2021, pp. 395–425.

17.

Ghosh

Nilges

, An algebraic approach to maliciously secure private set intersection, in: Annual International Conference on the Theory and Applications of Cryptographic Techniques, Springer, 2019, pp. 154–185.

18.

Gordon

S.D.

Hazay

P.H.

, Fully secure psi via mpc-in-the-head, in: Proceedings on Privacy Enhancing Technologies, 2022.

19.

Hazay

Venkitasubramaniam

, Scalable multi-party private set-intersection, in: IACR International Workshop on Public Key Cryptography, Springer, 2017, pp. 175–203.

20.

Inbar

Omri

Pinkas

, Efficient scalable multiparty private set-intersection via garbled bloom filters, in: International Conference on Security and Cryptography for Networks, Springer, 2018, pp. 235–252. doi:10.1007/978-3-319-98113-0_13.

21.

Ion

Kreuter

Nergiz

A.E.

Patel

Saxena

Seth

Raykova

Shanahan

Yung

, On deploying secure computing: Private intersection-sum-with-cardinality, in: 2020 IEEE European Symposium on Security and Privacy (EuroS&P), IEEE, 2020, pp. 370–389. doi:10.1109/EuroSP48549.2020.00031.

22.

Ion

Kreuter

Nergiz

Patel

Saxena

Seth

Shanahan

Yung

, Private intersection-sum protocol with applications to attributing aggregate ad conversions, Cryptology ePrint Archive, 2017.

23.

Ishai

Kilian

Nissim

Petrank

, Extending oblivious transfers efficiently, in: Annual International Cryptology Conference, Springer, 2003, pp. 145–161.

24.

Kissner

Song

, Private and threshold set-intersection, Tech. rep., Carnegie-Mellon Univ Pittsburgh Pa Dept Of Computer Science, 2004.

25.

Kolesnikov

Kumaresan

Rosulek

Trieu

, Efficient batched oblivious prf with applications to private set intersection, in: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, 2016, pp. 818–829. doi:10.1145/2976749.2978381.

26.

Kolesnikov

Matania

Pinkas

Rosulek

Trieu

, Practical multi-party private set intersection from symmetric-key techniques, in: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, 2017, pp. 1257–1272. doi:10.1145/3133956.3134065.

27.

Yin

Cheng

Feng

Liu

Zhou

, Unbalanced private set intersection cardinality protocol with low communication cost, Future Generation Computer Systems 102 (2020), 1054–1061. doi:10.1016/j.future.2019.09.022.

28.

Miao

Patel

Raykova

Seth

Yung

, Two-sided malicious security for private intersection-sum with cardinality, in: Annual International Cryptology Conference, Springer, 2020, pp. 3–33.

29.

Mohassel

Sadeghian

, How to hide circuits in mpc an efficient framework for private function evaluation, in: Annual International Conference on the Theory and Applications of Cryptographic Techniques, Springer, 2013, pp. 557–574.

30.

Motwani

Raghavan

, Randomized Algorithms, Cambridge University Press, 1995.

31.

Naor

Pinkas

, Efficient oblivious transfer protocols, in: SODA, Vol. 1, 2001, pp. 448–457.

32.

Nevo

Trieu

Yanai

, Simple, fast malicious multiparty private set intersection, in: Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, 2021, pp. 1151–1165. doi:10.1145/3460120.3484772.

33.

Niu

Wang

Song

, Privacy-preserving statistical computing protocols for private set intersection, International Journal of Intelligent Systems 37(12) (2022), 10118–10139. doi:10.1002/int.22420.

34.

Pagh

Rodler

F.F.

, Cuckoo hashing, in: Algorithms – ESA 2001: 9th Annual European Symposium Århus, Denmark, August 28–31, 2001, Proceedings, Springer, 2001, pp. 28–31.

35.

Pinkas

Rosulek

Trieu

Yanai

, Psi from paxos: Fast, malicious private set intersection, in: Annual International Conference on the Theory and Applications of Cryptographic Techniques, Springer, 2020, pp. 739–767.

36.

Pinkas

Schneider

Tkachenko

Yanai

, Efficient circuit-based psi with linear communication, in: Advances in Cryptology–EUROCRYPT 2019: 38th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Darmstadt, Germany, May 19–23, 2019, Proceedings, Part III, Vol. 38, Springer, 2019, pp. 122–153.

37.

Pinkas

Schneider

Weinert

Wieder

, Efficient circuit-based psi via cuckoo hashing, in: Advances in Cryptology–EUROCRYPT 2018: 37th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Tel Aviv, Israel, April 29-May 3, 2018, Proceedings, Part III, Vol. 37, Springer, 2018, pp. 125–157. doi:10.1007/978-3-319-78372-7_5.

38.

Pinkas

Schneider

Zohner

, Scalable private set intersection based on ot extension, ACM Transactions on Privacy and Security (TOPS) 21(2) (2018), 1–35. doi:10.1145/3154794.

39.

Rabin

M.O.

, How to exchange secrets with oblivious transfer, Cryptology ePrint Archive, 2005.

40.

Raghuraman

Rindal

, Blazing fast psi from improved okvs and subfield vole, in: Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, 2022, pp. 2505–2517. doi:10.1145/3548606.3560658.

41.

Rindal

Schoppmann

, Vole-psi: Fast oprf and circuit-psi from vector-ole, in: Annual International Conference on the Theory and Applications of Cryptographic Techniques, Springer, 2021, pp. 901–930.

42.

Trieu

Yanai

Gao

, Multiparty private set intersection cardinality and its applications, Cryptology ePrint Archive, 2022.

43.

Vos

Conti

Erkin

, Fast multi-party private set operations in the star topology from secure ands and ors, Cryptology ePrint Archive, 2022.

44.

Vos

Conti

Erkin

, Sok: Collusion-resistant multi-party private set intersections in the semi-honest model, Cryptology ePrint Archive, 2023.

Practical multi-party private set intersection cardinality and intersection-sum protocols under arbitrary collusion *

Abstract

Keywords

1. Introduction

1.1. Motivation

1.2. Related works

1.3. Our contributions

2. Preliminaries

Definition 1 ([12])

Definition 2 ([16])

Functionality 2 (OPPRF F opprf F , m , u )

Functionality 3 (Two-party Permute + Share F 2 PS m )

Functionality 4 (Multi-party Permute + Share F mPS T , m , i )

Functionality 5 (Two-party Oblivious Zero-Sum Check F 2 OZK m )

3. Two new primitives and constructions

3.1. Multi-party secret-shared shuffle

Functionality 6 (Multi-party Secret-Shared Shuffle F mSS T , m )

Protocol 3.1 (( Π mSS T , m ): Multi-party Secret-Shared Shuffle)

3.2. Multi-party oblivious zero-sum check

Functionality 7 (Multi-party Oblivious Zero-Sum Check F OZK T , m )

Theorem 2. Protocol 3.2( Π OZK T , m ) securely computes F OZK T , m in the presence of a semi-honest adversary which may corrupt up to T − 1 parties. Proof. Proof.

4. MPSI-CA with two non-colluding leaders

Functionality 8 (MPSI-CA F MPSI-CA )

4.1. Element sharing

Sub-protocol 4.1 (Element Sharing)

4.2. Detailed protocol

Protocol 4.2 (MPSI-CA with two non-colluding leaders)

5. MPSI-CA protocol under arbitrary collusion

5.1. Detailed description

Protocol 5.1 (MPSI-CA Under Arbitrary Collusion)

Theorem 4. Protocol 5.1 securely computes F MPSI − CA in the presence of a semi-honest adversary which may corrupt up to t parties ( t < n ) , if F opprf F , 3 m , b , F mSS T , b and F OZK T , b are secure against semi-honest adversaries. Proof. Proof.

5.2. Necessity of oblivious zero-sum check

6. MPSI-CA-sum protocol under arbitrary collusion

Functionality 9 (MPSI-CA-sum F MPSI − CA − sum )

6.1. Payload sharing

Sub-protocol 6.1 (Payload Sharing)

6.2. Detailed description

Protocol 6.2 (MPSI-CA-sum Under Arbitrary Collusion)

7. Experimental evaluation

7.1. Performance of MPSI-CA under arbitrary collusion

7.3. Comparison with other works

Footnotes

Acknowledgments

References

Functionality 2 (OPPRF $F_{opprf}^{F, m, u}$ )

Functionality 3 (Two-party Permute + Share $F_{2 PS}^{m}$ )

Functionality 4 (Multi-party Permute + Share $F_{mPS}^{T, m, i}$ )

Functionality 5 (Two-party Oblivious Zero-Sum Check $F_{2 OZK}^{m}$ )

Functionality 6 (Multi-party Secret-Shared Shuffle $F_{mSS}^{T, m}$ )

Protocol 3.1 (( $Π_{mSS}^{T, m}$ ): Multi-party Secret-Shared Shuffle)

Functionality 7 (Multi-party Oblivious Zero-Sum Check $F_{OZK}^{T, m}$ )

Theorem 2.
Protocol 3.2( $Π_{OZK}^{T, m}$ ) securely computes $F_{OZK}^{T, m}$ in the presence of a semi-honest adversary which may corrupt up to $T - 1$ parties.
Proof. Proof.

Functionality 8 (MPSI-CA $F_{MPSI-CA}$ )

Theorem 4.
Protocol 5.1 securely computes $F_{MPSI - CA}$ in the presence of a semi-honest adversary which may corrupt up to t parties $(t < n)$ , if $F_{opprf}^{F, 3 m, b}$ , $F_{mSS}^{T, b}$ and $F_{OZK}^{T, b}$ are secure against semi-honest adversaries.
Proof. Proof.

Functionality 9 (MPSI-CA-sum $F_{MPSI - CA - sum}$ )