Universal optimality and robust utility bounds for metric differential privacy 1

Abstract

We study the privacy-utility trade-off in the context of metric differential privacy. Ghosh et al. introduced the idea of universal optimality to characterise the “best” mechanism for a certain query that simultaneously satisfies (a fixed) ε-differential privacy constraint whilst at the same time providing better utility compared to any other ε-differentially private mechanism for the same query. They showed that the Geometric mechanism is universally optimal for the class of counting queries. On the other hand, Brenner and Nissim showed that outside the space of counting queries, and for the Bayes risk loss function, no such universally optimal mechanisms exist. Except for the universal optimality of the Laplace mechanism, there have been no generalisations of these universally optimal results to other classes of differentially-private mechanisms.

In this paper, we use metric differential privacy and quantitative information flow as the fundamental principle for studying universal optimality. Metric differential privacy is a generalisation of both standard (i.e., central) differential privacy and local differential privacy, and it is increasingly being used in various application domains, for instance in location privacy and in privacy-preserving machine learning. Similar to the approaches adopted by Ghosh et al. and Brenner and Nissim, we measure utility in terms of loss functions, and we interpret the notion of a privacy mechanism as an information-theoretic channel satisfying constraints defined by ε-differential privacy and a metric meaningful to the underlying state space. Using this framework we are able to clarify Nissim and Brenner’s negative results by (a) that in fact all privacy types contain optimal mechanisms relative to certain kinds of non-trivial loss functions, and (b) extending and generalising their negative results beyond Bayes risk specifically to a wide class of non-trivial loss functions.

Our exploration suggests that universally optimal mechanisms are indeed rare within privacy types. We therefore propose weaker universal benchmarks of utility called privacy type capacities. We show that such capacities always exist and can be computed using a convex optimisation algorithm. Further, we illustrate these ideas on a selection of examples with several different underlying metrics.

Keywords

Universal optimality metric differential privacy privacy-utility trade-off

1. Introduction

One of the main challenges in privacy is to devise mechanisms that protect the sensitive information contained in the data, while at the same time preserving an acceptable degree of utility. Differential privacy [14,15], one of the most successful frameworks for privacy protection, allows access to a dataset via querying the dataset through an interface controlled by the data curator, which ensures privacy by adding controlled random noise to the result of the query before reporting it. In this setting, the trade-off with utility is determined by how the noise is calibrated: a “bad” calibration may cause a lot of utility loss without necessarily improving privacy. Of course, the ideal mechanisms are the Pareto optimal ones, i.e., those that offer the best utility for a given level of privacy.

There are many notions and measures of utility, depending on the goals and the means of the data consumers. The goals are usually formalised in terms of loss functions, and an important class of consumers is the so-called Bayesian consumers, who may have some prior knowledge about the data and are able to make the most out of the reported answer, in the sense of deriving (via Bayesian inference, using their prior) the information that minimises the loss with respect to the true answer.

In the context of the Bayesian notion of utility, an important desideratum for a mechanism is to be optimal for all consumers, no matter what prior they hold. This is in line with the property of differential privacy, which does not depend on the prior knowledge of the attacker. This independence from the prior is appealing because when we design mechanisms we don’t know what kind of knowledge attackers and consumers may possess now and acquire over time.

To better understand the issues that arise in this context, let us introduce some notations.

Let $C$ be a class of (Bayesian) consumers, where each consumer is identified by a prior π and a loss function ℓ, and let $D$ be a class of ε-differentially private mechanisms. A mechanism $M \in D$ probabilistically transforms the true answer (input) into a reported answer (output, observation),2

²
This functionality of the mechanism corresponds to the so-called “oblivious” model, in which the noise depends only on the result of the query and not on the original dataset.

and it is called universally optimal wrt.

C

if M provides the best utility for every consumer in

C

compared with any other

M^{'} \in D

. “Best utility” for a consumer wrt. a mechanism M is the minimal value of the consumer’s loss function over all possible remappings from outputs to inputs, i.e.,

\begin{matrix} (1) & U_{ℓ} (π, M) = min_{r} \sum_{x : X} π_{x} \sum_{y : Y} M_{x, y} ℓ (r (y), x) \end{matrix}

for inputs X, observations Y and remapping function r. The universally optimal mechanism M is one for which

U_{ℓ} (π, M) ⩽ U_{ℓ} (π, M^{'})

for all other mechanisms

M^{'} \in D

, priors π and all loss functions ℓ within the class

C

Ghosh et al. [20] found that a universally optimal mechanism exists for the class $D$ of counting query mechanisms and the class $C$ of monotonic consumers, i.e., those described by a “monotone” loss function (see Def. 9 ahead). Conversely, Brenner and Nissim [8] showed that no such optimal mechanism exists for the monotone class of consumers with respect to various queries. Moreover, they proved that for the specific loss function $ℓ_{bin}$ , the optimal mechanisms never exist outside the linear structures of inputs corresponding to counting queries.

Both of the above results are tied to monotone loss functions, which are functions that capture the idea that guesses ‘closer’ to the secret should incur less loss than guesses ‘further away’. In the differential privacy literature, these kinds of loss functions are frequently used to model utility. For example, the functions $\begin{array}{r} ℓ_{1} (w, x) : = | w - x | and ℓ_{2} (w, x) : = {(w - x)}^{2} \end{array}$ are commonly used to describe consumers whose expected loss depends on the absolute difference ( $ℓ_{1}$ ) or squared difference ( $ℓ_{2}$ ) of guesses from inputs. The Bayes risk loss function is another popular monotonic loss function that describes a consumer whose goal is to guess the input in one try.

However, we argue that there are cases where we are interested in modelling consumers described by non-monotone loss functions. For example, consider the following scenarios:

(The Weary Traveler)3

This example is taken from [3].

A weary traveler wishes to accurately guess the time that the last RER-B train will depart Lozère for Paris. Clearly, guessing 2 minutes too late is very different from guessing 2 minutes too early. We would then model his loss using an asymmetric function which penalises guesses too late more than guesses too early.

(The Real-Estate Maverick) An agent wishes to guess the parity (odd or even) of house numbers released privately from a dataset.4

⁴

In some places, the side of the street on which a house is positioned – as determined by the parity of its street number – can make a significant difference in its sale value.

The agent chooses a property loss function5

⁵

No pun intended.

with 2 guesses (odd/even) which assign values of 0 when guessing correctly and 1 when guessing incorrectly.

Both of the above examples describe non-monotone loss functions which are not covered by the results of [20] or [8], but which may represent scenarios of interest in other domains. As further examples, consumers whose goals can be modelled as property inferences are typically captured by non-monotone loss functions. Also, the non-monotone $ℓ_{nib}$ loss function (introduced later in this paper) is crucial in computing capacities for classes of privacy mechanisms. The goal of this paper, then, is to study universal optimality in full generality, for Bayesian consumers over any class of loss functions.

The aforementioned results on optimality are in the context of the so-called central model of differential privacy, which assumes that the noise is added by a trusted curator. In recent years, however, the local model [13] has become more and more popular, also due to the interest of large companies such as Google and Apple. In the local model, the noise is added directly by the data producer, so the microdata are already obfuscated. Consequently, this model is more robust wrt. potential security breaches, and there is no need for a trusted third party.

Recently, it has been shown that both the central and local models can be unified under a generalised definition of differential privacy known as d-privacy [9], which describes a metric privacy type. d-privacy is increasingly being used in various application domains, for instance in location privacy, where it is known under the name of geo-indistinguishability [6], and in privacy-preserving machine learning [17].

Definition 1 (Metric privacy type).

Given a metric space $(X, d)$ , we say that a mechanism $M : X \to D Y$ satisfies d-privacy if for all $x, x^{'} \in X$ , the following constraint holds: $\begin{matrix} (2) & M (x) (Y) ⩽ e^{d (x, x^{'})} M (x^{'}) (Y), \end{matrix}$ where we denote $M (x) (Y)$ the probability that M’s output is contained in the subset $Y \subseteq Y$ . We call the class of d-private mechanisms on $X$ the metric privacy type associated to $(X, d)$ which we denote by $T_{X, d}$ .

It is easy to see that d-privacy subsumes both the central and the local model, both of which include a notion of distance implicitly in their definitions (the “adjacency” relation). For example, the metric privacy type for mechanisms acting on statistical outputs (eg. “counting queries” in the central model) is $T_{N, d_{2}}$ where $d_{2}$ is the Euclidean distance and $N$ the natural numbers.6

⁶
When viewed as a mechanism from datasets to reported answers, as is customary in central differential privacy, d would be the Hamming distance.

For local differential privacy, it is

T_{X, d_{D}}

for any inputs

X

, where

d_{D}

is the Discrete metric assigning distance 1 to all

x \neq x^{'} \in X

. Also note that d-privacy is strictly more general than local differential privacy, as it is able to express the distance between inputs and therefore to tune the noise accordingly. This can be useful in applications in machine learning, where appropriate metrics can be used to simplify the design of a privacy mechanism [26].

1.1. Contributions

Our contributions are as follows:

Using principles from Quantitative Information Flow, we study optimality for metric differential privacy in full generality; that is, d-private mechanisms that provide the best utility over arbitrary classes of loss functions. This extends the study by Ghosh et al. [20] and Brenner and Nissim [8] which considered optimality only for the class of monotone loss functions and the mechanisms corresponding to counting and sum queries.

We provide a complete characterisation of optimality within a privacy type for finite $X$ . This characterisation is independent of the workflow (oblivious, local or otherwise) in which the privacy type arises. This result is significant in that it provides a finite class of mechanisms that characterise the space.

We show that every privacy type contains universally optimal mechanisms (for non-trivial loss functions). This indicates that the impossibility result of Brenner and Nissim does not apply to all loss function classes; we provide a tightening of their result, clarifying that the impossibility result for sum queries holds for a class of strictly monotonic consumers.

We extend the qualitative study of optimality to a quantitative study, finding robust capacity bounds for universally optimal mechanisms within a privacy type. These capacity results provide a tight quantitative bound on the maximum leakage – a measure of the usefulness of a mechanism within a privacy type. In practice, for example, such bounds provide benchmarks for the experimental evaluation of implemented mechanisms.

1.1.1. Extensions to previous work

This work is an extended version of a previously published conference paper [19] and contains the following additions:

Properties of kernel mechanisms are described (Section 3) as well as an example (Section 3.3) demonstrating that kernel representations are non-unique.

A new section (Section 4) explaining the relationship between counting and sum queries in previous works on optimality, and our metric differential privacy setting.

A loss function algebra (Section 5.2) which explains how properties of loss functions flow through to properties of refinement. This is what enables larger classes of loss functions for which optimality holds to be constructed from simpler loss functions.

A full proof for Thm. 15 for completeness.

A worked example (Section 6) showing how to construct a universally ℓ-optimal mechanism for a non-trivial loss function given a metric privacy type. This clarifies how to use our results to construct loss functions for which optimality holds in a given setting.

New illustrations (Fig. 2, Fig. 6) which clarify important technical details.

1.2. Related work

The problem of universal optimality for arbitrary privacy types, as modelled by d-privacy, has not yet been studied in full generality. Chatzikokolakis et al. [9] showed that optimal mechanisms for “sum queries” can be constructed in the oblivious setting by considering a Manhattan metric, rather than a Hamming metric, on the original datasets. Recent work by Fernandes et al. [18] considers universal optimality of the Laplace mechanism in the continuous setting, using the same QIF framework as in our paper, although our work focuses on a more general problem of universal optimality restricted to discrete spaces. Other optimality results have been found in non-Bayesian settings. Gupte et al. [21] proved a universal optimality result for a type of risk-averse consumer using a minimax formulation. Aharya et al. [1] prove an optimality result for binary mechanisms similar to our result of Cor. 18, but the focus of their work is on minimax consumers. Kairouz et al. [22] proved optimality for “staircase mechanisms” using f-divergence as a utility measure, while Koufogiannis et al. [23] showed that the Laplace mechanism is optimal for the mean-squared error. Finally, Asi and Duchi [12] studied near-instance optimality, designed for particular dataset instances, in contrast with our study of optimality over mechanisms. It is important to note that the notions of utility induced by the above non-Bayesian consumers are simpler than ours, and their trade-off with privacy is more direct. Also, the question of universality wrt. the prior does not arise. Furthermore, most of the results assume heavy restrictions on the noise functions.

1.3. Organisation of the paper

We begin in Section 2 with a summary of the results needed from Quantitative Information Flow including how to formulate the Universal optimality problem under the same assumption of “Bayesian consumers” as used by Ghosh et al. and Brenner and Nissim. In Section 3 we introduce privacy types based on a metric d and a set of secrets $X$ ; given these ingredients we provide a complete characterisation of the mechanisms of that type. In Section 4 we explain how the counting and sum query results of Ghosh et al. and Brenner and Nissim respectively can be formulated in our metric differential privacy setting. In Section 5 we provide a thorough investigation of utility in terms of loss functions and how there is a close relationship between the complexity of the loss function and the structure of the underlying privacy mechanisms from which we can deduce both when universal properties exist and when they do not depending on the loss function (eg. its complexity) and the composition of the privacy mechanisms. In Section 6 we work through an example showing how to use our results to find a universally ℓ-optimal mechanism. In Section 7 we introduce privacy type capacities and show that they provide a benchmark for computing how much useful information can be extracted from a privacy mechanism. Finally in Section 8 we provide a range of examples that illustrate these ideas.

Where possible we have sketched proofs in the main body of the paper. Full proofs can be found in the appendix.

2. Quantitative information flow for privacy mechanisms

A probabilistic channel C takes an input $x \in X$ and outputs an “observation” $y \in Y$ according to a distribution $D Y$ . In the discrete case, such channels are $X \times Y$ matrices C whose row-x, column-y element $C_{x, y}$ is the probability that input x will produce output y. The x-th row $C_{x, -}$ is thus a discrete distribution in $D Y$ . We write $X \to D Y$ for the channel type.

Given a prior distribution $π : D X$ on $X$ , the channel C can be applied to π to create a joint distribution J in $D (X \times Y)$ , written $π ⊳ C$ and where $J_{x, y} : = π_{x} C_{x, y}$ . For that J, the left-marginal $\sum_{y} J_{x, y}$ (denoted $J_{x, Σ}$ ) gives the prior π again, i.e. the probability that the input was x — thus $π_{x} = J_{x, Σ}$ . The right marginal $J_{Σ, y}$ is the probability that the output is y, given both π and C. The y-posterior distribution on $X$ is the conditional probability that the input was x if that y was output: it is the y-th column (of the joint J) divided by the marginal probability of that y, that is $J_{-, y} / J_{Σ, y}$ (provided the marginal is not zero).

If we fix π and C, and use the conventional abbreviation $p_{X Y}$ for the resulting joint distribution $(π ⊳ C)$ , then the usual notations for the above are $p_{X}$ for left marginal $(= π)$ and $p_{X} (x)$ for its value $π_{x}$ at a particular x, with $p_{Y}$ and $p_{Y} (y)$ similarly for the right marginal. Then $p_{X | y} (x)$ is the posterior probability of the original observation’s being x when y has been observed. Further, we can write just $p (x)$ and $p (y)$ and $p (x | y)$ when context makes the (missing) subscripts clear.

2.1. Bayesian adversaries and generalised entropy

QIF assumes that adversaries are Bayesian: they are equipped with a prior $π : D X$ over secrets $X$ and use their knowledge of the channel $C : X \to D Y$ to maximise their advantage after making an observation. This is modelled in full generality using the “g-leakage framework” [4]. In detail, a gain function $g : W \times X \to R_{⩾ 0}$ models the gain to an adversary who takes an action $w \in W$ when the value of the secret is $x \in X$ . Focussing on actions rather than observations naturally encompasses the idea of “remapping” observations to their associated most likely secret value. Moreover, g-leakage gives a measurement for how successful are (privacy) mechanisms at communicating useful information by comparing the “prior” and “posterior” g-vulnerabilities as follows. For gain function g the (prior) vulnerability wrt. the prior π (for a consumer/adversary with this information) is the maximum gain over all possible actions: $V_{g} (π) : = {max}_{w : W} \sum_{x : X} π_{x} g (w, x)$ ; it quantifies the adversary’s success in the scenario defined by π and g. The expected (posterior) vulnerability is the adversary’s expected gain after observing y is $V_{g} [π ⊳ C] : = \sum_{y : Y} p (y) V_{g} (p (X | y))$ , or equivalently: $\begin{matrix} (3) & V_{g} [π ⊳ C] = \sum_{y : Y} max_{w : W} \sum_{x : X} π_{x} C_{x, y} g (w, x) . \end{matrix}$ The greater the difference in the prior/posterior vulnerability, the better is the adversary able to use the transmitted information to infer the value of the secret. (See Section 2.3 below.)

In this paper we use an equivalent formulation of leakage in terms of loss functions and “generalised entropies”, which we also call ℓ-uncertainties. In particular, a loss function $ℓ : W \times X \to R$ defines a generalised entropy or prior ℓ-uncertainty $U_{ℓ} : D X \to R$ as follows: $\begin{matrix} (4) & U_{ℓ} (π) : = min_{w : W} \sum_{x : X} π_{x} ℓ (w, x), \end{matrix}$ with the formulation for expected loss through a channel defined similarly using minimisation (compare (3)): $\begin{matrix} (5) & U_{ℓ} [π ⊳ C] = \sum_{y : Y} min_{w : W} \sum_{x : X} π_{x} C_{x, y} ℓ (w, x) . \end{matrix}$

We will often use g for gain and ℓ for loss, but either can be used in formulating a vulnerability ( $V_{g} / V_{ℓ}$ ) or ℓ-uncertainty ( $U_{g} / U_{ℓ}$ ); the leakage theory based on losses or gains is equivalent [3].

Comparing now the formulation for expected loss given in Eqn (5) with the Ghosh et al. formulation given by Eqn (1), we find that the latter can be expressed as a special case of the former.

Lemma 1.
The expected utility loss in Eqn ( 1 ) can be expressed using the posterior ℓ-uncertainty defined in Eqn ( 5 ).
Proof.
We reason as follows: $\begin{array}{c} U_{ℓ} (π, M) \\ = min_{r} \sum_{x \in X} π_{x} \sum_{y \in Y} M_{x, y} ℓ (r (y), x) “Eqn (1)” \\ = min_{r} \sum_{y \in Y} \sum_{x \in X} π_{x} M_{x, y} ℓ (r (y), x) “Reorganising” \\ = \sum_{y \in Y} min_{r} \sum_{x \in X} π_{x} M_{x, y} ℓ (r (y), x) “Min r is over each y \\ = \sum_{y \in Y} min_{w} \sum_{x \in X} π_{x} M_{x, y} ℓ (w, x) “Letting w = r (y) ” \\ = U_{ℓ} [π ⊳ M] “Eqn (5)” \end{array}$ □

Note that in the second-last step we employ a mapping from remaps $r (y)$ to guesses w. We can think of this as simply a relabelling of observations y to guesses w, and we can always ‘duplicate’ guesses w so that there is a one-one correspondence between guesses and inputs.

Since our more general formulation utility for Bayesian consumers subsumes the Ghosh et al. formulation from the literature, we will adopt Eqn (5) as our measure for utility in this work. This also allows us to use QIF reasoning in our study of optimality.
2.2. Hyper-distributions

Equation (3) indicates that the actual output $Y$ can be abstracted, leaving the leakage properties of a channel in the context of a prior to be determined by a hyper-distribution (or simply “hyper”). A hyper is a distribution of distributions on $X$ , having type $D D X$ or $D^{2} X$ . Given a channel C and prior π, we write $[π ⊳ C]$ for the hyper whose support is the set of posterior distributions $p (X | y)$ on $X$ and which assigns the corresponding marginal $p (y)$ to each. We usually denote by $δ^{y}$ the posterior $p (X | y)$ and by $α_{y}$ its corresponding marginal.

Fig. 1.

A channel C and its corresponding hyper $Δ_{C}$ produced by the action of the uniform distribution.

An example of a channel and hyper is shown in Fig. 1. The hyper $Δ_{C}$ is produced by pushing the uniform prior through the channel C to produce a joint distribution, which is then marginalised along its columns to produce the posteriors $δ^{y}$ (the columns in $Δ_{C}$ ) and the corresponding y-marginals $α_{y}$ (the labels beneath each column). The prior π can be recovered by averaging the posteriors $δ_{y}$ via the marginals $α_{y}$ , revealing a significant correspondence between channels and hypers:

Lemma 2 (Cor. 4.8 from [3]).

For any channel C and prior π, it holds that $π = \sum_{i} α_{i} δ^{i}$ where $[π ⊳ C] = \sum_{i} α_{i} [δ^{i}]$ .

We can take convex combinations of hypers, eg., $\sum_{i} λ_{i} Δ_{i}$ by distributing the convex coefficients $λ_{i}$ to the outer probabilities of the corresponding $Δ_{i}$ and then combining common posteriors so that, for example, the inner $δ^{y}$ is assigned the outer $\sum_{k} λ_{k} a_{Δ_{k}}$ where $a_{Δ_{k}}$ is the outer assigned to $δ^{y}$ in $Δ_{k}$ . We can also take convex combinations of channels in correspondence with hypers as follows: for convex coefficients $λ_{i}$ we have $\begin{matrix} C : = \sum_{i} λ_{i} C_{i} if [π ⊳ C] = \sum_{i} λ_{i} [π ⊳ C_{i}] . \end{matrix}$ That is, the convex sum of channels can be computed by converting channels to hypers, performing a convex combination, and recovering the resulting channel. It can also be computed directly on channels by applying each $C_{i}$ with probability $λ_{i}$ yielding a channel containing all of the $λ_{i}$ -scaled columns of each the $C_{i}$ . The convex sum can be interpreted as external probabilistic choice; when it involves only 2 channels we usually write ${C_{1}}_{p} \oplus C_{2}$ to mean $p C_{1} + (1 - p) C_{2}$ . See [3, Ch 4,8] for more detailed discussion.

2.3. Robustness measures: Refinement and channel capacities

The study of ℓ-uncertainties7

⁷
In the QIF literature these are dually described as g-vulnerabilities for gain function g.

has given rise to an elegant theory of refinement: we say that channel A refines channel B, written

B ⊑ A

, to mean that A is safer than B; alternatively when the ℓ-uncertainty of B is no greater than that of A for any prior and loss function.

Definition 2 (Refinement).

For channels A, B, we say that A refines B, written $B ⊑ A$ if and only if $U_{ℓ} [π ⊳ B] ⩽ U_{ℓ} [π ⊳ A]$ for all $π : D X$ and loss functions ℓ.

While Equation (2) describes refinement as a leakage relation between channels,8

⁸
It is sufficient to use posterior vulnerability to compare channels for leakage, since the prior vulnerability cancels out.

it is unclear how to verify whether refinement holds in practice. The following theorem establishes an important relationship between leakage characteristics and structural characteristics of channels which allows refinement to be efficiently verified.

Theorem 3 (Coriaceous, [24]).

For channels A, B, we have that $\begin{matrix} B ⊑ A iff there exists a channel P such that B P = A \end{matrix}$ where $B P$ is matrix multiplication.

The relationship between Thm. 3 and Equation (2) can be seen as a consequence of the data processing inequality: the channel P of Thm. 3 corresponds to a “postprocessing” step; when applied to B, the resulting channel A cannot leak more information (to any Bayesian adversary) than does the original channel B and therefore cannot be more vulnerable to attack than B.9

⁹
The data processing inequality doesn’t follow automatically; it follows here because our measures are information-theoretic.

Therefore, postprocessing always suppresses information flow. The channel

I

with a single column of 1’s is maximal in the refinement order and leaks nothing at all i.e.

U_{ℓ} [π ⊳ I] = U_{ℓ} (π)

Refinement is a strong universal utility property of channels (and therefore mechanisms modelled as channels). Refinement between two channels A and B means that for all priors and all gain (loss) functions channel A will always have more gain (less loss) when compared to the corresponding scenario for channel B. We shall see therefore that when A, B are members of the same privacy type then a refinement relation between them implies a universal optimality property between the two. We find, however, that in general there is no mechanism that anti-refines all mechanisms within the type, and therefore the universal optimality property can only hold for a subset of specific gain/loss functions.

2.3.1. Refinement of hypers

There is a second technique for structural verification of refinement which will be crucial to the study of universal optimality. This time instead of focussing on channels, we will use the hypers $[π ⊳ A]$ , $[π ⊳ B]$ induced by the application of Bayes’ rule on channels A, B via prior π. Structural refinement of hypers has a compelling geometric interpretation which we now describe.

We can depict a hyper as a weighted sum of vectors, where we identify each posterior $δ^{y}$ in the support of the hyper as a point in $R^{| X |}$ . A “refining Earth Move” [25] from hyper $[π ⊳ B]$ to hyper $[π ⊳ A]$ exists when each posterior $δ^{y}$ of $[π ⊳ A]$ is realised as a convex combination of posteriors of $[π ⊳ B]$ (i.e. an “Earth Move” from $[π ⊳ B]$ to each $δ^{y}$ ) which exactly matches the marginals of $[π ⊳ B]$ to produce the corresponding marginals on $[π ⊳ A]$ . It turns out that such an Earth Move exists precisely when $B ⊑ A$ . We then write: $\begin{matrix} (6) & B ⊑ A iff [π ⊳ B] ⊑ [π ⊳ A] \end{matrix}$ where π is any full support prior and $[π ⊳ B] ⊑ [π ⊳ A]$ precisely when the aforementioned Earth Move exists. Further, if the posteriors of $[π ⊳ B]$ are linearly independent (as vectors), the refining Earth Move exists whenever the posteriors of $[π ⊳ A]$ lie inside the convex hull of the posteriors of $[π ⊳ B]$ . An example depicting refinement of hypers is shown in Fig. 2.

Fig. 2.

Refinement of hypers. The posteriors of hyper $Δ_{1}$ are the orange points $p_{1}$ and $p_{2}$ and the blue point $p_{5}$ . The posteriors of hyper $Δ_{2}$ are the 4 orange points $p_{1}$ , $p_{2}$ , $p_{3}$ , $p_{4}$ . Their associated weights are given at the top-left of the diagram. We can check that each hyper averages to the uniform distribution, denoted by υ, contained in the convex hulls of their posteriors. There is a refining Earth move from $Δ_{2}$ to $Δ_{1}$ : the probability masses on $p_{1}$ and $p_{2}$ stay in place (the ‘trivial’ Earth move), but the probability masses of $\frac{1}{6}$ on points $p_{3}$ and $p_{4}$ move to the point $p_{5}$ (shown in the diagram), for a total Earth move of $\frac{1}{3}$ which corresponds to the weight of $p_{5}$ in $Δ_{1}$ . This gives that $Δ_{2} ⊑ Δ_{1}$ and therefore the corresponding channels are also in refinement.

Note that refinement on hypers is a true partial order. That is, channels which are “equivalent” under the (channel) refinement preorder collapse to the same hyper under any π. This greatly simplifies the study of channels in the optimality context.

2.3.2. Capacities

Finally, we note that an important robustness measure in QIF is the notion of capacity. Capacity is a measure of the worst-case leakage of a channel, quantified over all gain/loss functions, or all priors, or both.10

¹⁰
We express capacities using gain functions as not all capacities can be expressed dually using loss function formulations.

We recall that the leakage of the channel – a measure of its accuracy – can be computed using the difference between the posterior and prior vulnerabilities. The additive g-leakage is

| V_{g} [π] - V_{g} [π ⊳ A] |

, and the multiplicative g-leakage is

V_{g} [π ⊳ A] / V_{g} [π]

. The corresponding channel capacities take the maximal difference over all priors and

\underline{G X}

, the

0 / 1

-bounded vulnerabilities.11

¹¹

I.e. $0 ⩽ V_{g} [π] ⩽ 1$ .

Specifically the multiplicative and additive capacities respectively are:

\begin{matrix} (7) & {ML}^{\times} (C) : = max_{π, g} (V_{g} [π ⊳ C] / V_{g} [π]) \end{matrix}

and

\begin{matrix} (8) & {ML}^{+} (C) : = max_{π, g} | V_{g} [π] - V_{g} [π ⊳ C] | . \end{matrix}

It is easy to see that

1 ⩽ {ML}^{\times} (C) ⩽ | X |

and

0 ⩽ {ML}^{+} (C) ⩽ 1

. The larger either capacity, the more accurate the channel. For example,

{ML}^{+} (I) = 0

, and

{ML}^{\times} (I) = 1

consistent with it transmitting no information at all. Capacities provide an alternative robust bound on the ability of any mechanism in a privacy type to deliver accurate information since it represents a tight upper bound on the maximum leakage of any channel in the type. We show how to compute capacities for the whole type either by proving a universal optimality result or using the characterisation given in Section 3 below.

We shall show that irrespective of whether a type can support universal optimality results, the capacities are well-defined and there exist mechanisms that achieve the capacity. Such mechanisms represent the most efficient implementations that respect the privacy threshold whilst at the same time delivering the most information about the utility.

2.4. Universal optimality and refinement

Using the QIF notion of posterior ℓ-uncertainty as our utility measure (recall Eqn (5)), there is a natural connection between utility and refinement which extends to universal optimality. In its most general form, we define universal optimality as follows:

Definition 3 (Universal optimality).

Given a class of mechanisms $T$ we say that $M \in T$ is universally optimal (wrt. $T$ ) if $U_{ℓ} [π ⊳ M] ⩽ U_{ℓ} [π ⊳ M^{'}]$ for all $M^{'} \in T$ , all loss functions ℓ and all priors $π \in D X$ .

Using Eqn 2, this says that $M \in T$ is universally optimal whenever $M^{'} ⊑ M$ for all $M^{'} \in T$ . This definition appears to be too strong; indeed most mechanisms –even within the same privacy type– are not related by refinement [10]. Surprisingly, we find that there is a class $T$ for which this strong notion of universal optimality holds (Cor. 18). However, we also find that this class is unique (Thm. 20) and universal optimality is too strong in general.

Therefore, following Ghosh et al. we study a weaker order defined by specific loss functions:

Definition 4.
(Universal ℓ-optimality) Given a loss function ℓ and a class of mechanisms $T$ we say that $M \in T$ is universally ℓ-optimal (wrt. $T$ ) if $U_{ℓ} [π ⊳ M] ⩽ U_{ℓ} [π ⊳ M^{'}]$ for all $M^{'} \in T$ and priors $π \in D X$ .

The universal optimality definition adopted by Ghosh et al. is, using Def. 4, universal ℓ-optimality for all “monotone” loss functions ℓ (see Def. 9 ahead). We can think of Def. 4 as describing a type of restricted refinement which holds for all priors but only for loss functions $ℓ \in L$ . This idea of refinement restricted to loss function classes has not to our knowledge been previously studied, and is the focus of Section 5.
3. Characterising metric privacy types

In this section we study mechanisms within a privacy type, finding a new characterisation of the type requiring only a finite set of distinguished mechanisms which we call kernel mechanisms. This characterisation captures the leakage properties of the privacy type, reducing the study of optimality to the study of kernel mechanisms.

3.1. Privacy types as hyper-distributions

Following [2], we write a mechanism M of privacy type $T_{X, d}$ as a channel $C : X \to D Y$ satisfying $C_{x, y} ⩽ e^{d (x, x^{'})} C_{x^{'}, y}$ for all $x, x^{'} \in X$ and $y \in Y$ . i.e., as a set of constraints on each column of the channel. Importantly, there is a correspondence between C and hyper-distributions as follows:

Lemma 4.
C is d -private iff $δ_{x}^{y} - e^{d (x, x^{'})} δ_{x^{'}}^{y} ⩽ 0$ for every $δ^{y}$ in the support of $[υ ⊳ C]$ where $υ \in D X$ is the uniform distribution.

Applying Lem. 4 to each pair x, $x^{'}$ yields a feasible region of hypers (defined by their posteriors $δ^{y}$ ) enclosed by hyperplanes –therefore convex– which we call “the space of $T_{X, d}$ hypers”. Figure 3 illustrates one such space.

Observe that although Lem. 4 specifies the uniform distribution, Eqn (6) says that this restriction does not apply to channel refinement. In other words, we can reason about channel refinement (over all priors) by reasoning in the space of hypers wrt. the uniform distribution. This differentiates our approach from standard convex optimisation formulations of the differential privacy constraints on channels [7] which require quantification over all priors.

Fig. 3.
The space of $T_{X, d_{2}}$ hypers on $X = {x_{1}, x_{2}, x_{3}}$ for $ε = ln 2$ and $d_{2} (x_{1}, x_{2}) = 1$ , $d_{2} (x_{2}, x_{3}) = 1$ , $d_{2} (x_{1}, x_{3}) = 2$ . The coloured hyperplanes are the linear constraints which intersect the (triangular) simplex in $R^{3}$ , forming the convex region shown by the dotted lines. Every point inside this region represents a $ln 2 \cdot d_{2}$ -private posterior, and any $ln 2 \cdot d_{2}$ -mechanism (channel) corresponds to a convex sum of posteriors that average to the uniform prior υ.
3.2. A complete characterisation of privacy types

The convex space above describes the complete set of potential posteriors which can be found in hyper-distributions $[υ ⊳ C]$ . Using Lem. 2, we can construct valid channels from a set of posteriors, provided that they contain the uniform distribution and can be averaged to the uniform prior. Moreover, Lem. 4 says that if the posteriors satisfy the d-privacy constraints, then so will the corresponding channel. (Notice this in Fig. 1: the ratio between elements in each column of C matches the corresponding ratios in $Δ_{C}$ ). This means we can construct d-private channels from sets of d-private posteriors. We use this fact now to examine optimality via hypers.

We distinguish 2 types of hypers (and their corresponding mechanisms) formed from vertices in the space of $T_{X, d}$ hypers:

Definition 5 (Vertex Mechanism/Hyper).

Let $V \in T_{X, d}$ be a mechanism with corresponding hyper $Δ = [υ ⊳ V]$ . We say that V is a vertex mechanism (and Δ a vertex hyper) if the inners in $⌈ Δ ⌉$ are vertices in the space of $T_{X, d}$ hypers.

Vertex mechanisms are of particular interest because of their refinement properties: they are the minimal elements in the refinement order for their privacy type (that is, they have no anti-refinements which satisfy the privacy constraints). Put another way, it means that every $T_{X, d}$ mechanism is a refinement of a vertex mechanism. (See Appendix Section A.1 for details). However, (potentially) infinitely many exist in any privacy type. We will instead make use of particular vertex mechanisms that we call kernel mechanisms.

Definition 6 (Kernel Mechanism/Hyper).

Let $K \in T_{X, d}$ have corresponding hyper $Δ = [υ ⊳ K]$ . We say that K is a kernel mechanism and Δ a kernel hyper if K is a vertex mechanism, and if the inners in $⌈ Δ ⌉$ are linearly independent (as vectors).

Kernel mechanisms have a number of important properties that flow from their corresponding kernel hypers. The first property tells us that kernel mechanisms are irreducible.

Property 1.
If K is a kernel mechanism then there is no kernel mechanism $K^{'}$ st. $⌈ Δ_{K^{'}} ⌉ \subset ⌈ Δ_{K} ⌉$ where $Δ_{K}$ , $Δ_{K^{'}}$ are hypers corresponding to mechanisms K, $K^{'}$ respectively.

The next property says that kernel mechanisms generate all of the vertex mechanisms.
Property 2.
Any vertex mechanism can be written (non-uniquely) as a convex sum of kernel mechanisms. Conversely, any convex sum of kernel mechanisms is a vertex mechanism.

Our final property says that kernel mechanisms are not related by refinement.
Property 3.
If K, $K^{}$ are kernel mechanisms then $K ⋢ K^{}$ and $K^{*} ⋢ K$ .

Observe that although Property 2 says that kernel mechanisms generate the space of vertex mechanisms, the representation of a vertex hyper as a convex sum of kernel hypers is not necessarily unique (see Section 3.3 below). Importantly, every vertex mechanism can be expressed as a convex sum of kernel mechanisms, and these generate all $T_{X, d}$ mechanisms. This leads to the main result of this section.
Theorem 5 (Fundamental Characterisation of $T_{X, d}$ Mechanisms).

Every $T_{X, d}$ mechanism is a refinement of a convex sum of kernel mechanisms.

The usefulness of this characterisation is that we can now explore questions about optimality using a finite set of kernel mechanisms as a basis for generating classes of universally optimal mechanisms.

Note that our notion of privacy types plays a role similar to that of the privacy constraint graph in [8], but it is strictly more general. First of all, the privacy (constraint) graph is defined only for the central model, while, as explained in the introduction, the privacy types of d-privacy subsume and generalise both the central and the local model. Furthermore, from a technical point of view, privacy graphs can be expressed using a metric, but the vice-versa is not true, since in privacy graphs there is no notion of distance between nodes.

3.3. Kernel representations are not unique: An example

Although our characterisation in terms of kernel hypers is a strong property, it turns out that the kernel representation for a mechanism is not unique. The following counter-example shows how this can occur. Consider the following hyper:

We can check that this is a valid hyper – its inners are valid distributions, its outer is a distribution and the inners average (via the outer) to the uniform prior. It is also a vertex hyper in the space of $ln 2 \cdot d_{D}$ -private hypers where $d_{D}$ is the Discrete metric. This space consists of the following kernel hypers:

We can express the hyper Δ in the following ways: $\begin{aligned} Δ = \frac{1}{3} H_{1} + \frac{1}{3} H_{2} + \frac{1}{3} H_{3} \\ Δ = \frac{4}{9} R_{1} + \frac{5}{9} R_{2} \end{aligned}$ i.e. As 2 different convex sums of kernel hypers. And therefore the same holds for the corresponding vertex and kernel mechanisms.

4. The metric differential privacy context for counting and sum queries

In order to compare different mechanisms for utility, they must be able to be applied to the same privacy problem. This privacy scenario thereby defines the privacy type of mechanisms within which we seek an optimal one.

The optimality results of [8,20] are situated in the context of oblivious mechanisms. We recall the definition of an oblivious mechanism as follows:

Definition 7 (Oblivious Mechanism).

Given a metric $d_{X}$ on $X$ , an oblivious mechanism $K : X \to D Z$ is a $d_{X}$ -private mechanism which can be decomposed into a query $f : X \to Y$ and a probabilistic mechanism $H : Y \to D Z$ st. $K = H \circ f$ . That is, the mechanism H depends only on the inputs $Y$ and not on the original dataset $X$ .

The $d_{X}$ -privacy of the overall system is determined by the composition $H \circ f$ , whereas the utility of the system depends on H alone, since utility is a measure of how much the output z reveals and the true query answer $f (x)$ . In summary, we reason about privacy on the composed mechanism K but we reason about utility on the mechanism H. This relationship is depicted in Fig. 4.

Fig. 4.

Oblivious $d_{X}$ -private mechanisms K are composed of a deterministic function f and a $d_{Y}$ -private mechanism H. We reason about privacy using the composition $K = H \circ f$ but we reason about utility using H alone.

The question that arises in the metric differential privacy setting is how to design the mechanism H so as to induce the appropriate $d_{X}$ -privacy guarantee on the overall mechanism K. We recall first the idea of a metric induced by a graph.

Definition 8 (Induced metric).

Let $X$ , $Y$ be discrete and let $d_{X}$ be a metric on $X$ . Given a surjective function $f : X \to Y$ , define the graph of f as follows: treat each $y \in Y$ as a node and for each $x, x^{'} \in X$ draw an edge between $f (x)$ and $f (x^{'})$ with weight $d_{X} (x, x^{'})$ . Define $d_{Y} (y, y^{'})$ as the minimum path weight from y to $y^{'}$ . Then $d_{Y}$ is called the metric induced by the graph of f.

The following now holds:

Lemma 6 ([9]).

Let $f : X \to Y$ and $d_{X}$ , $d_{Y}$ metrics on $X$ , $Y$ such that $d_{Y}$ is the metric induced by the graph of f. Then f is 1-Lipschitz wrt $d_{X}$ , $d_{Y}$ and $H : Y \to D Z$ satisfies $d_{Y}$ -privacy if and only if $H \circ f$ satisfies $d_{X}$ -privacy.

Lem. 6 provides important conditions under which it is safe to reason about the utility of mechanisms H in order to guarantee $d_{X}$ -privacy on $H \circ f$ .

Returning to universal optimality, the universally optimal mechanism, if it is to be found, is one such mechanism H as depicted in Fig. 4 – and the privacy type of such mechanisms are the ones which induce, via f, the desired $d_{X}$ -privacy guarantees on K. Lem. 6 tells us that this privacy type is the set of $d_{Y}$ -private mechanisms, where $d_{Y}$ is the metric induced by f and the metric on K. Moreover, the following result shows how these are related by refinement.

Theorem 7 ([10]).

Let M, $M^{'}$ be mechanisms. If $M ⊑ M^{'}$ and $M^{'}$ is d -private then M is also d -private.

Putting these together, we deduce that – in the oblivious context – the candidate mechanisms H for optimality are all d-private, for some metric d induced by the metric $d_{X}$ on $H \circ f$ (Lem. 6). And the universally optimality mechanism (if it exists) is the minimal element in the “anti-refinement chain” induced by the relation ⊑ (Thm. 7).

For the weaker notions of universal $L$ -optimality we can still safely reason using refinement, since if M is universally $L$ -optimal for some class $L$ then it cannot be the case that there is some $M^{'}$ such that $M^{'} ⊏ M$ (strict refinement), otherwise $M^{'}$ would also have better utility for $L$ .

4.1. Relationship to “counting” and “sum” queries

The results on optimal mechanisms which motivate our study are based on the traditional model of differential privacy and oblivious mechanisms: there, the space of inputs is databases of individuals, and privacy is measured using the Hamming metric between databases – thus a Hamming distance of 1 describes databases which differ in one row: a single individual. The space of outputs is determined by a query f which takes the input dataset and outputs some real-value; in the case of “counting queries”, the output is always a natural number, whereas in the case of “sum queries” it could be any real value.12

¹²
Other queries of interest, such as “average” fall under our study of “sum” queries, so that in fact the sum and counting queries cover all of the cases of interest for statistical datasets.

The noise-adding mechanism, operating on the query output, is optimal if it allows a consumer to learn the most information about the query result – whatever information that consumer is seeking (modelled by her loss function) – out of all the mechanisms which guarantee some fixed level of (differential) privacy on the original databases.

The optimality result of Ghosh et al. [20] applies specifically to counting queries; we generalise this result now to the metric differential privacy setting. Using Def. 8 we can define more precisely what we mean by “counting queries”.

Lemma 8.

Let $(X, d_{H})$ be a metric space where $d_{H}$ denotes the Hamming distance. A counting query is a 1-Lipschitz map $f : (X, d_{H}) \to ({0 \dots n}, d_{2})$ , where $n = | X |$ and $d_{2}$ is the Euclidean distance. Moreover, $d_{2}$ is the induced metric on ${0 \dots n}$ wrt f, $d_{H}$ .

And now the optimality result(s) of Ghosh et al. can be restated in the metric-based setting:

Theorem 9 (Theorem 3.2, Corollary 3.3 in [20]).

Let $X = [0, 1, \dots, n]$ and let $d_{2}$ be the Euclidean metric on X. Then the Geometric mechanism on X is universally $L_{d}^{m}$ -optimal for the privacy type $T_{X, d_{2}}$ where $L_{d_{2}}^{m}$ is the class of monotone loss functions on $d_{2}$ .

Our discussion of monotone loss functions is yet to come; we note that the result above applies equally to quantised reals ${0, q, 2 q, \dots}$ by simply scaling the domain [9].

Thm. 9 allows us to generalise the optimality result for any oblivious context in which $d_{2}$ is the induced metric on a (scaled) domain of ${0 \dots N}$ . We will study this privacy type in Section 8.1.

Likewise, we can generalise the impossibility result of Brenner and Nissim [8] to metrics. Their theorem also applies in the context of monotone loss functions but this time for sum queries, which generalise to the discrete metric $d_{D}$ :13

¹³
Recall that the discrete metric assigns distance 1 to all pairs of distinct elements.

Lemma 10.

Let $(X, d_{H})$ be a metric space. A sum query is a 1-Lipschitz map $f : (X, d_{H}) \to ({0 \dots N}, d_{D})$ . Moreover, $d_{D}$ is the induced metric on ${0 \dots N}$ wrt f, $d_{H}$ .

The impossibility of Brenner and Nissim was framed in the context of the Ghosh et al. result, intending to show that universal optimality is unique in the Euclidean context. Their result can now be reframed as follows:

Theorem 11 ([8]).

Let $X$ be a domain of secrets and $d_{D}$ be the Discrete metric on $X$ . Then there are no universally $L_{d}^{m}$ -optimal mechanisms for the privacy type $T_{X, d_{D}}$ where $L_{d}^{m}$ is the class of monotone loss functions.

While the metric d for the loss function class in Thm. 11 was not made explicit, the result was proven by showing that there are no universally $ℓ_{bin}$ -optimal mechanisms in the type $T_{X, d_{D}}$ for the loss function $ℓ_{bin}$ , which is in both $L_{d_{2}}^{m}$ and $L_{d_{D}}^{m}$ . Observe however that this leaves open the possibility that there exist universally optimal mechanisms for loss functions within the classes $L_{d_{2}}^{m}$ and $L_{d_{D}}^{m}$ . We will explore this idea in the next sections.

5. Utility for Bayesian consumers

In Section 3 we showed that optimality for a privacy type can be studied using its associated kernel mechanisms. In this section we study loss functions, which (as mentioned in Section 2) are used to determine the average accuracy (utility) of a mechanism with respect to a Bayesian consumer. We will show, using a duality between optimal loss function classes and optimal mechanisms (Thm. 15), how optimality results for a loss function ℓ can be extended to optimality results for other mechanisms (using the structure of the underlying (kernel) mechanisms), or extended to larger classes of loss functions (using an optimality-preserving algebra).

5.1. Classes of loss functions

A loss function defines a generalised entropy of type $D X \to R$ . Such entropies are concave (and continuous), i.e. $U_{ℓ} (π_{1} +_{p} π_{2}) ⩾ p U_{ℓ} (π_{1}) + (1 - p) U_{ℓ} (π_{2})$ . There are many loss functions that have been identified as useful for analysing information flow properties, including accuracy of mechanisms.

Definition 9 (Monotone, strictly monotone and trivial loss functions).

Loss functions are said to be monotone ind denoted $L_{d}^{m}$ , when they have the form $ℓ (w, x) = m (d (x, α (w)))$ , where m is a monotone, non-decreasing function on the reals, and $α : W \to W$ is an injective function. The strict monotone subclass has m a strictly increasing function. The trivial loss functions $L^{⋆}$ are independent of $W$ , so have the form $f (x)$ for some real-valued function f. We have the following relationships between these loss functions: $\begin{matrix} L^{⋆} \subseteq L_{d}^{m^{+}} \subseteq L_{d}^{m} \subseteq L . \end{matrix}$

Observe that the trivial loss functions are independent of the underlying metric and cannot be used to measure any non-trivial information flow property because they correspond to adversarial settings where the adversary does not have a choice of actions. The associated entropies of trivial loss functions, namely $U_{ℓ}$ correspond to linear functions in $D X$ , i.e. $U_{ℓ} (p \times π_{1} + (1 - p) \times π_{2}) = p \times U_{ℓ} (π_{1}) + (1 - p) \times U_{ℓ} (π_{2})$ , and in fact $U_{ℓ} [υ ⊳ M] = U_{ℓ} [υ]$ whatever the mechanism M. Non-trivial loss functions –corresponding to non-planar generalised entropies– are critical for measuring information flow properties as they characterise how much an adversary is able to use partial information leaks to further his intent. For example, $ℓ_{nib}$ (defined in Eqn (26)) can be seen to be non-trivial as depicted in Fig. 5 at right: the planar regions represent the action taken by the adversary to minimise his loss relative to his (current) knowledge of the secret as represented by eg. a posterior distribution.

Fig. 5.

$U_{ℓ_{nib}} (π)$ for $X = {x_{1}, x_{2}, x_{3}}$ . We can see that $ℓ_{nib}$ is non-trivial because $U_{ℓ_{nib}}$ is not planar.

Ghosh et al. [20] were the first to identify some important structures in the relation between loss functions and privacy mechanisms. They identified the class of monotone loss functions relative to the privacy type $T_{N, d_{2}}$ . In their formulation the function α acts as a “remapping” to select the most likely input for a given observation made as part of the privacy mechanism (here represented as a channel).

Finally we note that we shall see examples later that show that the consideration of whether a loss function is included in the monotone set is sensitive to the underlying metric.

5.2. Loss function algebra

We begin our exploration with an examination of properties of loss functions which will allow the construction of larger classes of functions for which optimality holds.

For $a, b ⩾ 0$ and loss functions ℓ, $ℓ^{'}$ we define: $\begin{array}{c} (9) & (ℓ * a) (w, x) : = ℓ (w, x) \times a \\ (10) & (ℓ + a) (w, x) : = ℓ (w, x) + a \\ (11) & (ℓ - a) (w, x) : = ℓ (w, x) - a \\ (12) & (ℓ * a + ℓ^{'} * b) ((w_{1}, w_{2}), x) : = ℓ (w_{1}, x) \times a + ℓ^{'} (w_{2}, x) \times b \end{array}$

Note that we can extend Defs (10), (11) to arbitrary functions $q : X \to R$ : $\begin{aligned} (13) & (ℓ + q) (w, x) : = ℓ (w, x) + q (x) \end{aligned}$

The following results are easy to show; firstly for prior ℓ-uncertainties we have, for any prior $π : D X$ and $a, b ⩾ 0$ : $\begin{array}{c} (14) & U_{ℓ * a} (π) = U_{ℓ} (π) \times a \\ (15) & U_{ℓ + a} (π) = U_{ℓ} (π) + a \\ (16) & U_{ℓ - a} (π) = U_{ℓ} (π) - a \\ (17) & U_{ℓ * a + ℓ^{'} * b} (π) = U_{ℓ} (π) \times a + U_{ℓ^{'}} (π) \times b \end{array}$

And for functions $q : X \to R$ : $\begin{aligned} (18) & U_{ℓ + q} (π) & = U_{ℓ} (π) + \sum_{x} π_{x} q (x) \end{aligned}$

These properties flow through to posterior ℓ-uncertainties; for any prior $π : D X$ , $a, b ⩾ 0$ and channel C we have: $\begin{array}{c} (19) & U_{ℓ * a} [π ⊳ C] = U_{ℓ} [π ⊳ C] \times a \\ (20) & U_{ℓ + a} [π ⊳ C] = U_{ℓ} [π ⊳ C] + a \\ (21) & U_{ℓ - a} [π ⊳ C] = U_{ℓ} [π ⊳ C] - a \\ (22) & U_{ℓ * a + ℓ^{'} * b} [π ⊳ C] = U_{ℓ} [π ⊳ C] \times a + U_{ℓ^{'}} [π ⊳ C] \times b \end{array}$

And again for functions $q : X \to R$ : $\begin{aligned} (23) & U_{ℓ + q} [π ⊳ C] & = U_{ℓ} [π ⊳ C] + \sum_{x} π_{x} q (x) \end{aligned}$

We can now deduce some useful refinement properties. In the following we write $M ⊑_{ℓ} M^{*}$ to mean $U_{ℓ} [π ⊳ M] ⩽ U_{ℓ} [π ⊳ M^{*}]$ for all $π \in D X$ .

Property 4.
If $M ⊑_{ℓ} M^{}$ then $M ⊑_{(ℓ a)} M^{}$ for $a ⩾ 0$ .
Property 5.
If $M ⊑_{ℓ} M^{}$ then $M ⊑_{(ℓ + a)} M^{}$ for $a ⩾ 0$ .
Property 6.
If $M ⊑_{ℓ} M^{}$ then $M ⊑_{(ℓ - a)} M^{}$ for $a ⩾ 0$ .
Property 7.
If $M ⊑_{ℓ} M^{}$ and $M ⊑_{ℓ^{'}} M^{}$ then $M ⊑_{(ℓ a + ℓ^{'} * b)} M^{}$ for $a, b ⩾ 0$ .
Property 8.
If $M ⊑_{ℓ} M^{}$ then $M ⊑_{(ℓ + q)} M^{}$ for $q : X \to R$ .

The above properties allow us to reason about optimality on larger classes of loss functions by leveraging optimality results for known classes.

Finally, we present the following important result which explains why it is always safe to reason about loss functions wrt the action of the uniform distribution.14
¹⁴
This result is known in its dual form via gain functions [3].

We first define for prior $π : D X$ and loss function ℓ: $\begin{aligned} (24) & (π ⋆ ℓ) (w, x) : = π_{x} ℓ (w, x) \end{aligned}$

We now have the following: Lemma 12.
For any loss function ℓ, channel M and prior π,* $\begin{matrix} U_{ℓ} [π ⊳ M] = n \times U_{π ⋆ ℓ} [υ ⊳ M] \end{matrix}$ where υ is the uniform prior and $n = | X |$ .

5.3. Examples of significant loss functions

The function $ℓ_{bin}$ is a loss function commonly used to measure the ability of an adversary to guess the secret: note that its corresponding entropy $U_{ℓ_{bin}}$ is referred to as Bayes’ risk. Letting $W = X$ , we define: $\begin{matrix} (25) & ℓ_{bin} (x, w) : = 0 if (x = w) else 1 . \end{matrix}$ Its dual is defined $\begin{matrix} (26) & ℓ_{nib} (x, w) : = 1 if (x = w) else 0 . \end{matrix}$ Observe that $ℓ_{bin} \in L_{d_{2}}^{m}$ , but $ℓ_{bin} \notin L_{d_{2}}^{m^{+}}$ , whilst $ℓ_{bin} \in L_{d_{D}}^{m^{+}}$ . Meanwhile $ℓ_{nib}$ is not monotone in any metric, but it is non-trivial and we shall see that it is significant for computing channel capacities.

When $W$ and $X$ are subsets of a Euclidean space, we can define an average loss function: $\begin{matrix} (27) & ℓ_{Avg} (x, w) : = d_{2} (w, x) . \end{matrix}$ Note that $ℓ_{Avg}$ is strictly monotone in the type $T_{N, d_{2}}$ ; the corresponding entropy $U_{ℓ_{Avg}}$ computes the average distance from the true value of the secret.

Finally we also consider general non-monotone loss functions, where ℓ is an arbitrary real-valued loss function. Although these have not yet been studied for privacy mechanisms, they offer significant insights because they can express generic properties for studying property inferences [3,5], and they can provide quantitative robustness measures due to the “miracle theorems” which quantify channel capacity. We recall these important results as follows.

Theorem 13 (Multiplicative Miracle theorem [3]).

The following 15

¹⁵
Note that multiplicative capacity is defined by maximising, hence $V_{g}$ .

holds for all mechanisms M, non-negative gain functions g and priors π:

\begin{matrix} V_{g} [π ⊳ M] / V_{g} [π] ⩽ V_{ℓ_{nib}} [υ ⊳ M] / V_{ℓ_{nib}} [υ], \end{matrix}

where

V_{ℓ_{nib}} [υ ⊳ M] = 1 - U_{ℓ_{bin}} [υ ⊳ M]

Theorem 14 (Additive Miracle theorem [3]).

The following holds for all mechanisms M, 0/1-bounded vulnerabilities $V_{g}$ and priors π: $\begin{matrix} V_{g} [π ⊳ M] - V_{g} [π] ⩽ 1 - | X | U_{ℓ_{nib}} [υ ⊳ M] . \end{matrix}$

Observe that Thm. 13 and Thm. 14 can give robust quantitative bounds on general information leakage properties (including utility) for privacy types discussed above. We will illustrate this for the Euclidean and Discrete metrics in Section 8.

5.4. Which mechanisms have non-trivial ℓ-optimal loss functions?

One way to understand the ℓ-optimality problem within a type is to make the association of the loss function with an adversary (or consumer as in Ghosh et al. [20]) explicit. Recall that the loss function formalises an adversary’s intent by describing the cost of taking action w when the secret is x. This means that a loss function is actually universally optimal for mechanism M exactly when the consumer (formalised by ℓ) can make better use of the information leaked by M compared with any other mechanism in the privacy type. These adversaries prefer M over all other mechanisms in the privacy type, independent of their prior knowledge. Thus rather than starting with a loss function ℓ and asking whether it has an ℓ-optimal mechanism, instead given a mechanism M ask which consumers (loss functions) would prefer M over all others. We study that idea next.

Definition 10 (Utility set).

Let $M \in T_{X, d}$ . Define $\begin{matrix} L_{M}^{T_{X, d}} : = {ℓ \in L | M is universally ℓ -optimal within T_{X, d}} \end{matrix}$

Notice that $L_{M}^{T_{X, d}}$ is closed under addition and multiplication by a non-negative generalised scalar, where addition is defined: $\begin{matrix} (28) & (ℓ_{1} + ℓ_{2}) (x, (w_{1}, w_{2})) : = ℓ_{1} (x, w_{1}) + ℓ_{2} (x, w_{2}), \end{matrix}$ and generalised scalar multiplication is defined for (non-negative) real-valued function v: $\begin{matrix} (29) & (v \times ℓ) (x, w) : = v (x) \times ℓ (x, w) . \end{matrix}$

Observe also that $L_{M}^{T_{X, d}}$ always contains $L^{⋆}$ the set of “trivial” loss functions, and the utility set for the trivial mechanism that leaks no information at all consists only of trivial loss functions.

More interestingly it turns out that the structure of a privacy mechanism is reflected in the structure of the corresponding utility sets. There are two important constructors for mechanisms where we see this relationship clearly. The first is the external probabilistic choice: for channels $M, M^{'} \in T_{X, d}$ we write $M_{p} \oplus M^{'}$ as the mechanism obtained by applying M with probability p and $M^{'}$ with probability $(1 - p)$ . This yields a channel containing the columns of M scaled by p together with the columns of $M^{'}$ scaled by $(1 - p)$ , whose columns therefore clearly satisfy the same differentially private constraints as M, $M^{'}$ and so $M_{p} \oplus M^{'} \in T_{X, d}$ .

The second is restriction of a mechanism in $T_{X, d}$ to one in $T_{X^{'}, d}$ where $X^{'} \subset X$ . We write $M ↓ X^{'}$ for the restriction of the mechanism $M \in T_{X, d}$ to operate only on the subset of secrets $X^{'}$ obtained by removing rows of M corresponding to the secrets $X - X^{'}$ . Notice that $M ↓ X^{'}$ retains the original metric d because the differentially-private constraints are only evaluated on the rows corresponding to secrets in $X^{'}$ .

Correspondingly in the space of loss functions we have the following constructions. Denote by $ℓ ↓ X^{'}$ the restriction of the loss function ℓ to some subset $X^{'}$ of its secrets, defined by $(ℓ ↓ X^{'}) (w, x) : = ℓ (w, x)$ for $x \in X^{'}$ . Conversely we can lift a loss function to a superset of secrets $X$ , denoted $ℓ ↑ X$ ; it is obtained by “padding” the loss function with 0’s. That is, $(ℓ ↑ X) (w, x) : = ℓ (w, x)$ when $x \in X \subseteq X$ and 0 when $x \notin X$ .

We observe now that utility sets have a structure that is dual to the structure of mechanisms.

Theorem 15 (ℓ-Optimal Duality).

Let $T_{X, d}$ and $T_{X, d}$ be privacy types where $X \subseteq X$ . For mechanisms $M, M^{'} \in T_{X, d}$ , we have the following.

$L^{⋆} \subseteq L_{M}^{T_{X, d}}$ ,

$M ⊑ M^{'}$ implies $L_{M}^{T_{X, d}} \supseteq L_{M^{'}}^{T_{X, d}}$ ,

$L_{I}^{T_{X, d}} = L^{⋆}$ , where $I$ is the trivial mechanism,

$L_{M ⨁_{p} M^{'}}^{T_{X, d}} = L_{M}^{T_{X, d}} \cap L_{M^{'}}^{T_{X, d}}$ for $0 < p < 1$ ,

$ℓ \in L_{M ↓ X}^{T_{X, d}}$ if and only if $ℓ ↑ X \in L_{M}^{T_{X, d}}$ .

Proof.
We prove each statement in turn.
$L^{⋆} \subseteq L_{M}^{T_{X, d}}$ . This follows since for $ℓ \in L^{⋆}$ we have that $U_{ℓ} (π, M) = U_{ℓ} (π)$ for any mechanism M. Therefore all mechanisms are ℓ-optimal for trivial loss functions.

$M ⊑ M^{'}$ implies $L_{M}^{T_{X, d}} \supseteq L_{M^{'}}^{T_{X, d}}$ . Let $ℓ \in L_{M^{'}}^{T_{X, d}}$ . We have that: $\begin{matrix} U_{ℓ} (π, M) ⩽ U_{ℓ} (π, M^{'}) ⩽ U_{ℓ} (π, M), \end{matrix}$ where the first inequality holds by the refinement assumption, and the second by the universal ℓ-optimality of $M^{'}$ Thus $U_{ℓ} (π, M) = U_{ℓ} (π, M^{'})$ and therefore $ℓ \in L_{M}^{T_{X, d}}$ .

$L_{I}^{T_{X, d}} = L^{⋆}$ , where $I$ is the trivial mechanism.

By Lem. 2 and Thm. 5 we can represent the hyper $[υ, K]$ as the sum $\sum_{i} a_{i} v_{i}$ , where $v_{i}$ are vertices in the convex space. Suppose that ℓ is not trivial and that $U_{ℓ} (υ) = \sum_{x} ℓ (x, w^{})$ , then at least one vertex v must have $U_{ℓ} (v) = \sum_{x} ℓ (x, w)$ , where $w \neq w^{}$ . Assume wlog. that $v = v_{1}$ . This means that: $\begin{array}{c} U_{ℓ} (υ, I) \\ = U_{ℓ} (υ) \\ = \sum_{i} a_{i} U_{ℓ} (υ) \\ = \sum_{i} a_{i} (\sum_{x} ℓ (x, w^{})) \\ > \sum_{i} a_{i} min_{w \in W} (\sum_{x} ℓ (x, w)) \\ = U_{ℓ} (υ, K), \end{array}$ thus $ℓ \notin L_{I}^{T_{X, d}}$ .

$L_{M ⨁_{p} M^{'}}^{T_{X, d}} = L_{M}^{T_{X, d}} \cap L_{M^{'}}^{T_{X, d}}$ for $0 < p < 1$ . Note that $U_{ℓ} (π, M ⨁_{p} M^{'}) = p \times U_{ℓ} (π, M) + (1 - p) \times U_{ℓ} (π, M^{'})$ . Thus if $ℓ \in L_{M}^{T_{X, d}} \cap L_{M^{'}}^{T_{X, d}}$ it must also be contained in $L_{M ⨁_{p} M^{'}}^{T_{X, d}}$ .

Conversely, if $ℓ \in L_{M}^{T_{X, d}} - L_{M^{'}}^{T_{X, d}}$ then there is some π such that $U_{ℓ} (π, M) < U_{ℓ} (π, M^{'})$ , in which case $U_{ℓ} (π, M) < U_{ℓ} (π, M_{p} \oplus M^{'})$ also, since $0 < p < 1$ .

$ℓ \in L_{M ↓ X}^{T_{X, d}}$ if and only if $ℓ ↑ X \in L_{M}^{T_{X, d}}$ . This follows since for $π \in D X$ we have $U_{ℓ} (π, M ↓ X) = U_{ℓ ↑ X} (π, M)$ . More generally $U_{ℓ ↑ X} (π, M) = α \times U_{ℓ} (π^{'}, M ↓ X)$ , where $α = π (X)$ and $π_{x}^{'} = π_{x} / α$ for $x \in X$ . □

Notice that Thm. 15(1) means that $L_{M}^{T_{X, d}}$ is always non-empty, containing at least the set of trivial loss functions. As might be expected, Thm. 15(2) implies that the more information leaked through a mechanism, the more loss functions it has in its utility set, with Thm. 15(3) showing that the trivial mechanism, which always implements differential privacy (since it leaks no information at all), is only universally ℓ-optimal for the set of trivial loss functions.

Next, Thm. 15(4) shows that when mechanisms are constructed using external probabilistic choice, the resulting utility set is the intersection of the utility sets of the components.

Finally, Thm. 15(5) provides an important method for discovering ℓ-optimal results by studying simpler privacy types with fewer secrets, but having the same underlying metric. We make use of Thm. 15(5) in Section 8 when we find universal optimality results in various metric spaces of interest.

A corollary to Thm. 15 is that the kernel mechanisms generate maximal utility sets.
Corollary 16.
If $M \in T_{X, d}$ is a universally ℓ-optimal mechanism then* $ℓ \in L_{K}^{T_{X, d}}$ for some kernel mechanism K in the type.
Proof.
Follows from Thm. 15(2) and (4) as Thm. 5 implies M refines a convex sum of kernel mechanisms. □

But even more, in cases where there is a unique kernel mechanism generating the privacy type, that mechanism satisfies universal optimality.
Theorem 17.
If $K \in T_{X, d}$ is the unique kernel mechanism then $L_{K}^{T_{X, d}} = L$ .
Proof.
From Thm. 5 we note that all mechanisms $M \in T_{X, d}$ must satisfy $K ⊑ M$ and so by the definition of ⊑ (e.g. (2) for loss functions) the result follows. □

It turns out that there is exactly one space for which a unique kernel mechanism exists, and from Thm. 17 this yields the following universal optimality result:
Corollary 18.
If $| X | = 2$ , and for any metricd, the following mechanism (represented as a channel) is universally optimal: $\begin{matrix} (30) & T = [\begin{array}{cc} k \cdot e^{d} & k \\ k & k \cdot e^{d} \end{array}], \end{matrix}$ where $k = \frac{1}{(1 + e^{d})}$ is a scaling factor to ensure rows sum to 1.
Proof.
The space of $T_{X, d}$ hypers on 2 inputs ${x_{1}, x_{2}}$ is an interval on the line $x_{1} + x_{2} = 1$ with only 2 vertices, corresponding to $x_{1} = e^{d} x_{2}$ and $x_{2} = e^{d} x_{1}$ . Since these are linearly independent, they must be the posteriors of a kernel hyper (cf. Def. 6) which is the only vertex hyper in the space. The result follows from Thm. 17. □

The above result is significant in that it holds for all loss functions, and is non-trivial in the sense that binary mechanisms are of interest in the privacy community. (eg., the original Random Response mechanism of Warner which is used in Google’s RAPPOR [16]). The optimal mechanism on 2 inputs is depicted in Fig. 6.

Fig. 6.
The universally optimal mechanism on 2 inputs for $d (x_{1}, x_{2}) = ln 2$ . The blue lines show the constraints defining the convex region (grey line segment). The region has exactly one mechanism defined by the 2 vertices (orange points).

We shall see that using restriction and Thm. 15 we will be able to build non-trivial loss functions contained in the utility sets for kernel mechanisms for any privacy type.
5.5. Robustness results

As mentioned earlier, Brenner and Nissim [8] studied the question of universal ℓ-optimality specifically for types described in terms of privacy constraint graphs, which for us can be described using metrics. For example, sum queries can be modelled as mechanisms in the privacy type $T_{X, d_{D}}$ . Whilst the negative results of Brenner and Nissim apply only to $ℓ_{bin}$ within their characterisation of privacy types in terms of constraint graphs, they leave open the more fundamental question about whether optimality for non-trivial loss functions exists more generally. The next theorem (partially) answers that question, showing that non-trivial ℓ-optimal mechanisms always exist, and so universal ℓ-optimality is more common than expected.

Theorem 19 (Universal optimality existence).

Given any privacy type $T_{X, d}$ , if $| X | > 1$ then there is a non-trivial monotone loss function ℓ and a mechanism $M \in T_{X, d}$ such that M is universally ℓ-optimal.

In particular Thm. 19 applies even when the underlying metric is $d_{D}$ . More significantly, once we have found basic non-trivial loss functions contained in a mechanism’s utility set, we can construct more complex loss functions, also in the utility set, by using addition (Eqn (28)) and scaling (Eqn (29)). Moreover, depending on the underlying metric many kinds of loss function can be significant for evaluating accuracy. In fact $ℓ_{bin}$ and $ℓ_{nib}$ are particularly significant since, if a corresponding universally optimal mechanism exists within a type, a robust benchmark for optimal loss exists (see Thm. 24).

5.6. Impossibility results

The universal impossibility results of Brenner and Nissim were studied for the monotone loss function class wrt. sum queries. In this section we generalise this result to arbitrary metrics and loss functions. Importantly, we tighten the impossibility, showing that it holds only for the class of strictly monotone loss functions. We note that our proof is more general and direct than Brenner and Nissim’s result as we are able to access generic properties of the QIF framework.

We begin with a fundamental impossibility result:

Theorem 20 (Impossibility of Universally Optimal Mechanisms).

There are no universally optimal mechanisms in $T_{X, d}$ for any d and for $| X | > 2$ .

Thm. 20 applies to all loss function classes, and thus is a strengthening of the Brenner and Nissim result in that it shows that the impossibility also applies to the counting query class of Ghosh et al.

In terms of the weaker universal ℓ-optimality, Brenner and Nissim’s impossibility result for the Bayes’ risk loss function $ℓ_{bin}$ shows that, at least in the Discrete domain, there exist loss functions (and therefore consumers) for which no mechanisms are universally optimal. However, as we have seen, this is not the case for all loss functions, nor is it the case for all monotone loss functions. The next result tightens the impossibility for sum queries, showing that it holds for the class of strictly monotone loss functions.

Theorem 21.
Let ℓ be a strictly monotone loss function on d for some metric d . Then there are no universally ℓ-optimal mechanisms in the privacy type $T_{X, d_{D}}$ over $n > 2$ inputs.

Thm. 21 applies to many loss functions of interest. For example: the average loss function $ℓ_{Avg}$ is strictly monotone in $d_{2}$ although it is non-monotone in $d_{D}$ ; $ℓ_{bin}$ is strictly monotone in $d_{D}$ but not strictly monotone in $d_{2}$ .

Curiously, Thm. 21 depends on strictly monotone loss functions wrt any metric d. To give some intuition about this result, the proof (detailed in Appendix Section A.2) shows that strict monotonicity (wrt anyd) implies that $ℓ ↓ {x_{1}, x_{2}}$ is non-trivial for every $x_{1} \neq x_{2}$ , and that kernel mechanisms in the $d_{D}$ -privacy space always contain a pair ${x_{1}, x_{2}}$ for which optimality only holds on trivial loss functions; viz. the result rests on the behaviour of $T_{X, d_{D}}$ mechanisms on 2 secrets.

This suggests that consumers who wish to differentiate over all pairs of secrets will not be well-served by mechanisms of the $T_{X, d_{D}}$ type. However, consumers who wish to distinguish between a particular pair of secrets can expect better utility, as optimal mechanisms exist in this case. (This follows from Thm. 19). This is really the nature of the discrete metric – it can distinguish between individual pairs of inputs, but not all pairs at the same time. This highlights the importance of understanding the metrics involved when designing mechanisms with the utility of consumers in mind.

Finally, the next corollary follows from the proof of Thm. 21 (detailed in Appendix Section A.2):
Corollary 22.
There are no universally $ℓ_{bin}$ -optimal mechanisms of type $T_{X, d_{D}}$ for $| X | > 2$ .

This is a strengthening of the result in [8] which proves the above for ε over some threshold. Our result holds for all ε.
6. Worked example: Constructing a universally ℓ-optimal mechanism

We now demonstrate how to use our results so far to construct a universally ℓ-optimal mechanism for a non-trivial monotone loss function ℓ. We choose the domain of 3 secrets $X = {x_{1}, x_{2}, x_{3}}$ and the Discrete metric $d_{D}$ . The following diagram depicts the space for $ε = ln 2$ .

The green points A, $A^{'}$ , B, $B^{'}$ comprise a vertex hyper $Δ_{V}$ , itself composed of the kernel hypers $Δ_{K_{A}} = A A^{'}$ and $Δ_{K_{B}} = B B^{'}$ in some convex combination (not yet specified). i.e. $Δ_{V} = λ Δ_{K_{A}} + (1 - λ) Δ_{K_{B}}$ where:

It is easy to check that $K_{A} ↓ {x_{1}, x_{3}} \equiv T$ and $K_{B} ↓ {x_{1}, x_{3}} \equiv T$ , where we write T for the universally optimal mechanism on 2 inputs (cf. Cor. 18) and $K_{A}$ , $K_{B}$ for the kernel mechanisms corresponding to hypers $Δ_{K_{A}}$ , $Δ_{K_{B}}$ respectively. Now, using loss function algebra, we show that $K_{A}$ , $K_{B}$ are universally ℓ-optimal for the following loss function:

We first construct $ℓ_{1} : {x_{1}, x_{3}} \times W \to R_{⩾ 0}$ as $ℓ_{1} (w_{1}, x_{1}) = ℓ_{1} (w_{2}, x_{3}) = 1$ and $ℓ_{1} (w_{1}, x_{2}) = ℓ_{1} (w_{2}, x_{1}) = 0$ . T is universally $ℓ_{1}$ -optimal (by Cor. 18), and therefore so are $K_{A} ↓ {x_{1}, x_{3}}$ and $K_{B} ↓ {x_{1}, x_{3}}$ (by channel equivalence). Therefore, writing: $ℓ_{2} = ℓ_{1} ↑ X$ , $ℓ_{3} = (ℓ_{2} * 3)$ , $ℓ_{4} = (ℓ_{3} + q (x))$ where $q (x_{1}) = q (x_{3}) = 0$ , $q (x_{2}) = 1$ , we reason: $\begin{array}{c} K_{A} ↓ {x_{1}, x_{3}}, K_{B} ↓ {x_{1}, x_{3}} are universally ℓ_{1} -optimal \\ \Rightarrow K_{A}, K_{B} are universally ℓ_{2} -optimal “Thm. 15(5)” \\ \Rightarrow K_{A}, K_{B} are universally ℓ_{3} -optimal “Prop. 4” \\ \Rightarrow K_{A}, K_{B} are universally ℓ_{4} -optimal “Prop. 8” \\ \Rightarrow λ K_{A} + (1 - λ) K_{B} is universally ℓ_{4} -optimal “Thm. 15(4)” \end{array}$ Finally, we note that $ℓ_{4} = ℓ$ and that ℓ is monotonic on $d_{D}$ using the mapping $α (w_{1}) = x_{3}$ , $α (w_{2}) = x_{1}$ . Therefore, for any choice of λ we find that the vertex mechanism V corresponding to hyper $Δ_{V}$ is universally ℓ-optimal. Note that it is also $ℓ_{2}$ , $ℓ_{3}$ -optimal – and there are many possible constructions for which optimality holds.

7. Universal privacy capacities

Section 5.6 teaches us that universal ℓ-optimality can be difficult to obtain for arbitrary privacy types. In spite of this it would still be helpful to have some understanding about the ability of the mechanism to transmit useful information whilst still satisfying its privacy constraints. We introduce a weaker universal property of called privacy type capacity which we show is always well-defined when there are only finitely-many secrets. The privacy type capacity is an upper bound on the leakage of any mechanism in the type, and therefore provides a universal benchmark for maximal accuracy.

Definition 11 (Privacy type capacity).

Given a privacy type $T_{X, d}$ we define multiplicative and additive privacy type capacities as follows: $\begin{matrix} (31) & {ML}^{\times} (T_{X, d}) : = sup_{C \in T_{X, d}} ({ML}^{\times} (C)) \end{matrix}$ and $\begin{matrix} (32) & {ML}^{+} (T_{X, d})) : = sup_{C \in T_{X, d}} ({ML}^{+} (C)) . \end{matrix}$

Theorem 23 (Privacy type capacity is well defined).

Given any privacy type $T_{X, d}$ , if $| X |$ is finite then both multiplicative and additive type capacities exist (Def. 11 ) and for each, there exists mechanisms in $T_{X, d}$ which achieve the capacity bound.

Proof.
From the fundamental characterisation Thm. 5 every mechanism in $T_{X, d}$ is a refinement of a convex sum of kernel mechanisms. When $| X |$ is finite there are only finitely many of these. This means that the maximal capacity amongst kernel mechanisms dominates the capacity of an convex sum of them, or refinement of them. The result now follows. □

The next result shows the relationship between type capacity and universal optimality results: since $ℓ_{bin}$ and $ℓ_{nib}$ are the gain functions that compute the capacity, if they belong to a mechanism’s associated set of universal optimality gain functions then they will achieve the privacy type bound.
Theorem 24 (Universal optimal capacity).

Given any privacy type $T_{X, d}$ , if $ℓ_{bin} \in L_{M}^{T}$ then ${ML}^{\times} (M) ⩾ {ML}^{\times} (M^{'})$ for any mechanism $M^{'} \in T_{X, d}$ . Similarly if $ℓ_{nib} \in L_{M}^{T}$ then ${ML}^{+} (M) ⩾ {ML}^{+} (M^{'})$ for any mechanism $M^{'} \in T_{X, d}$ .

The proof appeals to the additive and multiplicative miracle theorems [3] (restated at Thm. 13 and Thm. 14). Observe that these results give quantitative measurements of the accuracy of mechanisms within a whole privacy type.

In general, though, even when a privacy type does not contain any mechanism whose associated optimality loss function classes contain either $ℓ_{bin}$ or $ℓ_{nib}$ , we are still able to compute the capacity and discover a mechanism that achieves the capacity bounds using convex optimisation.

Lemma 25.
The mechanism that achieves the optimal capacity within a privacy type $T_{X, d}$ can be written as a channel with no more than $| X |$ columns.
Corollary 26.
The additive capacity for privacy type $T_{X, d}$ where $X = {x_{1}, \dots, x_{n}}$ is $1 - c$ where c is the optimal value for the linear optimisation problem described by: $\begin{array}{c} minimise m_{11} + \dots + m_{n n} \\ \sum_{j} m_{i j} = 1 1 ⩽ i ⩽ n \\ m_{i j} ⩽ m_{k j} \times e^{d (x_{i}, x_{k})} 1 ⩽ i, k ⩽ n \\ m_{i j} ⩾ 0 0 ⩽ i, j ⩽ n \end{array}$

Similarly we can compute the multiplicative capacity within a type.
Corollary 27.
The multiplicative capacity for privacy type $T_{X, d}$ where $X = {x_{1}, \dots, x_{n}}$ can be found as a linear optimisation problem described by: $\begin{array}{c} maximise m_{11} + \dots + m_{n n} \\ \sum_{j} m_{i j} = 1 1 ⩽ i ⩽ n \\ m_{i j} ⩽ m_{k j} \times e^{d (x_{i}, x_{k})} 1 ⩽ i, k ⩽ n \\ m_{i j} ⩾ 0 0 ⩽ i, j ⩽ n \end{array}$

Observe that for both capacities the optimisation problem contains $O (| X |^{2})$ constraints. In some situations this number can be reduced, depending on the properties of the metric.

Note that Cor. 26 and Cor. 27 are both examples of the optimisation formulations similar to that used by Ghosh et al. [20] for loss functions related to counting queries. Significant here though is that our results produce universal measurements for privacy type capacity as well as a simplification of how to compute the average minimal loss, yielding the simple linear objective function given here (rather than minimising over a convex function as in Eqn (1) or (5) which accounts for the “remapping” feature).
8. Examples of privacy types and their optimality results

In this section we focus on characterisations for some metric spaces of interest, including the two spaces addressed in the literature: $(N, d_{2})$ and $(X, d_{D})$ for which the results of Ghosh et al. and Brenner and Nissim hold respectively. We enumerate the kernel mechanisms in each of these spaces by computing the extreme points of the convex region of hypers, and then finding sets of (up to) n posteriors which average to the uniform distribution.

8.1. Privacy type defined by $(N, d_{2})$

This privacy type is the one studied by Ghosh et al., corresponding to “counting queries”. Although rich with mechanisms, the Geometric mechanism is the only well-known of this type. It has two instances – the (infinite) Geometric mechanism with outputs over the whole of $Z$ and the truncated Geometric mechanism, in which the output domain matches the input domain (typically a subsequence of $N$ ). These instances in fact have the same leakage properties, i.e., they are equivalent as channels and thus produce the same hyper-distribution [11].

Definition 12 (Geometric mechanism).

The α-geometric mechanism $G : X \to D Z$ has the following channel matrix: $\begin{matrix} G_{x, y} = \frac{1 - α}{1 + α} \cdot α^{d_{2} (x, y)} \end{matrix}$ where $α \in (0, 1]$ . This mechanism satisfies $ε \cdot d_{2}$ -privacy where $d_{2}$ is the Euclidean metric and $ε = - ln α$ .

Unsurprisingly (given its optimality properties), we find that the Geometric mechanism is a kernel mechanism.

Lemma 28.
The (infinite/truncated) Geometric mechanism is a kernel mechanism of privacy type $T_{N, d_{2}}$ .

Figure 7b depicts the space of $T_{N, d_{2}}$ mechanisms on 3 secrets. On n inputs, this space is constructed from $2 \times (n - 1)$ linear DP constraints resulting in $2^{n - 1}$ vertices – extreme points of the feasible region of posteriors of this type.

Figure 7a lists all kernel mechanisms for n up to 6.

Fig. 7.
Kernel mechanisms in the space of $d_{2}$ -private hypers.

Ghosh et al. showed that the Geometric mechanism is universally optimal for the class of monotone loss functions, meaning that it provides the most accuracy (within its type) to Bayesian consumers described by monotone loss functions. However this result does not provide any way to determine exactly how accurate the data release can be, even for this optimal mechanism. A QIF analysis is able to give robust measurements for the amount of leakage as expressed by gain or loss functions; combined with Thm. 24 we are now able to give robust measurements for the whole privacy class. In particular since $ℓ_{bin} \in L_{G}$ , we deduce that the multiplicative channel capacity of the Geometric dominates the multiplicative channel capacity for any mechanism in the whole privacy type. Going further we can compute the exact dominating channel capacity. (Refer to Fig. 7a for numerical examples.)
Corollary 29 (Dominating multiplicative capacity).

Let N be the size of $X$ and let $α = e^{- ε}$ , where ε is given by the definition of DP. Then for any mechanism $M \in T_{X, d_{2}}$ we have that its multiplicative capacity is no more than $\frac{(N (1 - α) + 2 α)}{(1 + α)}$ .

We also observe that the Geometric mechanism is the kernel mechanism which obtains the additive capacity for the mechanisms we computed (cf. Fig. 7a), suggesting that it may also be optimal for the non-monotone $ℓ_{nib}$ loss function.

8.2. Privacy type defined by $(X, d_{D})$

This privacy type is the one studied by Brenner and Nissim, corresponding to “sum queries”. For an input space $X$ of size n, the vertices of the convex region are points of intersection of $n - 1$ hyperplanes, corresponding to $n - 1$ distinct privacy constraints. Since there are only 2 possible distances under the discrete metric, each vertex can only contain 2 possible values, and so the number of possible vertices is given by $\sum_{i = 1}^{n - 1} (\binom{n}{i}) = 2^{n} - 2$ .16

¹⁶
Thinking of each vertex as having either the ‘high’ or the ‘low’ value, this is the number of combinations for each choice of ‘high’ values.

We present some of these mechanisms in Fig. 8a, including the well-known “Random Response” mechanism.

Definition 13 (Random Response mechanism).

The α-random response mechanism $R : X \to D X$ has the following channel matrix: $\begin{array}{c} R_{x, x} = \frac{1}{k} \\ R_{x, y} = \frac{α}{k} for x \neq y \end{array}$ where k is a normalisation term and $α \in (0, 1]$ . This mechanism satisfies $ε \cdot d_{D}$ -privacy where $d_{D}$ is the discrete metric and $ε = - ln α$ .

Fig. 8.

Kernel mechanisms in the space of $d_{D}$ -private hypers.

We note that the Random Response mechanism is a kernel mechanism. The space of mechanisms on 3 secrets for the Discrete metric is depicted in Fig. 8b.

Lemma 30.

The Random Response mechanism is a kernel mechanism in the privacy type $T_{X, d_{D}}$ .

Brenner and Nissim studied mechanisms of this privacy type, focussing on Bayesian consumers and $ℓ_{bin}$ , showing that no mechanism is universally optimal for $ℓ_{bin}$ . Their result demonstrates that there is no mechanism for the Discrete privacy type with the same universal optimality properties as the Geometric mechanism in the Euclidean privacy type. However Thm. 19 indicates that there are in fact mechanisms which are ℓ-optimal for some non-trivial (monotone) loss functions. The next example of a mechanism is in fact ℓ-optimal for a non-trivial monotone loss function. Consider the following hypers of type $T_{X, ln 2 \cdot d_{D}}$ on n inputs:

We use Thm. 15(5) as follows. Concentrating on inputs $X^{†} : = {x_{1}, x_{n}}$ , we see that $K_{A} ↓ X^{†} = K_{B} ↓ X^{†} = T$ (where T is the universally optimal mechanism from Cor. 18). Since T is the unique kernel mechanism in a 2-element state space, we have that both are $ℓ_{e}$ -optimal for loss function $ℓ_{e} (w, x)$ which has two actions, labelled by $X^{†}$ , and is 0 whenever $w \neq x$ , and 1 if $w = x$ .17

¹⁷

In fact both $K_{A} ↓ X^{†}$ , $K_{B} ↓ X^{†}$ are ℓ-optimal for all loss functions on $X^{†}$ .

From Fig. 5 we can see immediately that

ℓ_{e}

is non-trivial, thus by Thm. 15(5) we see that both

K_{A}

K_{B}

are

ℓ_{e} ↑ X

-optimal, as well as all probabilistic combinations

K_{A} ⨁_{p} K_{B}

defined above.

While we can compute a capacity for the whole Euclidean type from the universal optimality of the Geometric, this is not possible on the Discrete space owing to the impossibility results for $ℓ_{bin}$ . We can use instead the characterisation of its structure directly to compute dominating additive and multiplicative capacities giving a tight quantitative upper bound for the accuracy of any mechanisms in the type. The dominating multiplicative capacity is given by the random response R, and the dominating additive capacity is given by its dual $R^{\circ}$ as follows. Let $ε = - ln α = ln β$ , and $k = 1 + (| X | - 1) e^{- ε}$ and $m = 1 + (| X | - 1) e^{ε}$ , then: $\begin{matrix} \begin{array}{lll} R_{i j} & = & 1 / k, if i = j \\ = & α / k, otherwise \end{array} and \begin{array}{lll} R_{i j}^{\circ} & = & 1 / m, if i = j \\ = & β / m, otherwise . \end{array} \end{matrix}$

The dual $R^{\circ}$ has the minimum elements on the diagonal whereas the Random Response R has its maxima there. (Recall Fig. 8b for illustration.)

Corollary 31 (Dominating capacities).

For any M in the Discrete privacy type, we have ${ML}^{\times} (M) ⩽ | X | / (1 + (| X | - 1) e^{- ε})$ and ${ML}^{+} (M) ⩽ 1 - 1 / (1 + (| X | - 1) e^{ε})$ .

Proof.
From Thm. 5, M must be a refinement of a convex sum of vertex mechanisms. From the set of vertices we can see that any inner δ in the support of $[υ ⊳ M]$ must satisfy $1 / m ⩽ δ_{x} ⩽ 1 / k$ . The result now follows from the definition of R and $R^{\circ}$ and the miracle theorems Thm. 13 and Thm. 14. □
8.3. Privacy type defined by $(X, d_{2})$ on grids

This privacy type is described by the Euclidean distance $d_{2}$ on $n \times n$ grids, as might be used in geo-location privacy, for example. Here we find that there are a large number of vertices on even the $2 \times 2$ space, making computing all of the kernel mechanisms expensive. Nevertheless, using Cor. 26 and Cor. 27 we can efficiently compute the capacities even when the kernel mechanisms cannot be enumerated. Figure 9a illustrates some results for small grids.

Fig. 9.

The Euclidean privacy type on grids.

In addition, optimality results in this space can be obtained for computed kernel mechanisms using Thm. 15(5) by employing the same technique illustrated in Section 8.2.

8.4. Privacy type on the Hamming cube

This privacy type is defined by the Hamming distance $d_{H}$ on bitstrings. As with other examples, the dimensions of this type is the number of rows in the mechanism, i.e., $2^{n}$ for bitstrings of length n, and the vertices are the extreme points of the feasible region of posteriors. Here, even on bitstrings of length 3 the number of kernel mechanisms exceeds 29000, although we again are able to compute capacities for higher dimensions. These are illustrated in Fig. 10a.

Fig. 10.

The privacy type defined by the Hamming cube.

Figure 11 depicts a kernel mechanism K and its associated hyper $Δ_{K}$ for the Hamming cube on 3 secrets. We now show how to construct a loss function ℓ for which K is universally ℓ-optimal.

Observe (Fig. 11) that for $X = {000, 100, 110, 111}$ , the mechanism $K ↓ X$ is exactly the Geometric mechanism G (for $ε = ln 2$ ). In other words, $ℓ_{bin}$ is in the utility set $L_{G}^{T_{N, d_{2}}}$ which coincides with $L_{G}^{T_{X, d_{H}}}$ (noting that the Hamming metric is linear on the secrets X). Therefore, from Thm. 15(5) we have that $ℓ_{bin} ↑ X$ is in $L_{K}^{T_{X, d_{H}}}$ , using the fact that $K ↓ X = G$ . Therefore K is universally $ℓ_{bin} ↑ X$ -optimal. And in fact K will be universally $ℓ ↑ X$ -optimal for any monotone loss function ℓ defined on X. Such a loss function would be useful to a consumer who is only interested in learning the inputs X and finds no value in the remaining inputs $X ∖ X$ .

Note that $X = {000, 100, 110, 111}$ corresponds to the orange path depicted in Fig. 10b for which we observed that the Hamming metric is linear.

Fig. 11.

A kernel hyper ( $Δ_{K}$ ) and corresponding kernel mechanism (K) on bitstrings of length 3 wrt the Hamming distance $d_{H}$ with $ε = ln 2$ . K is universally ℓ-optimal for many loss functions derived from the optimality of the geometric mechanism on 4 secrets.

9. Conclusion

We have extended the notion of universal optimality to d-privacy. Our main technique is to use the properties of hyper-distributions and refinement provided by QIF, enabling a geometric interpretation of a class of privacy mechanisms and a characterisation of privacy types in terms of refinement. A principal contribution is to clarify the negative results of Brenner and Nissim, extending them beyond $ℓ_{bin}$ to other strictly monotonic mechanisms, and removing the connection to ε. Our study of the duality relationship between mechanisms and loss functions provides techniques for finding new optimality results in new domains beyond sum and counting queries, as illustrated by our example of mechanisms over the hamming cube. Finally, where the underlying domain is very complex, making it unlikely to find universal optimality results, we introduce the notion of privacy type capacity and show how to compute it for both multiplicative and additive capacities. With these results, we provide strong benchmarks for leakage measurements in domains where full formal analysis might be difficult.

Footnotes

Appendix

References

Acharya ,

Bonawitz ,

Kairouz ,

Ramage and

Sun , Context aware local differential privacy, in: International Conference on Machine Learning, PMLR, 2020, pp. 52–62.

M.S.

Alvim ,

M.E.

Andrés ,

Chatzikokolakis and

Palamidessi , On the relation between differential privacy and quantitative information flow, in: Automata, Languages and Programming – 38th International Colloquium, ICALP 2011, Proceedings, Part II, Zurich, Switzerland, July 4–8, 2011, 2011, pp. 60–76. doi:10.1007/978-3-642-22012-8_4.

M.S.

Alvim ,

Chatzikokolakis ,

McIver ,

Morgan ,

Palamidessi and

Smith , The Science of Quantitative Information Flow, Springer, 2019.

M.S.

Alvim ,

Chatzikokolakis ,

Palamidessi and

Smith , Measuring information leakage using generalized gain functions, in: 2012 IEEE 25th Computer Security Foundations Symposium, IEEE, 2012, pp. 265–279. doi:10.1109/CSF.2012.26.

M.S.

Alvim ,

Fernandes ,

McIver and

G.H.

Nunes , On privacy and accuracy in data releases (invited paper), in: 31st International Conference on Concurrency Theory, CONCUR 2020 (Virtual Conference), Vienna, Austria, September 1–4, 2020,

Konnov and

Kovács , eds, LIPIcs, Vol. 171, Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2020, pp. 1–1118. doi:10.4230/LIPIcs.CONCUR.2020.1.

M.E.

Andrés ,

N.E.

Bordenabe ,

Chatzikokolakis and

Palamidessi , Geo-indistinguishability: Differential privacy for location-based systems, in: Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security, 2013, pp. 901–914.

N.E.

Bordenabe ,

Chatzikokolakis and

Palamidessi , Optimal geo-indistinguishable mechanisms for location privacy, in: Proc. CCS, 2014, pp. 251–262.

Brenner and

Nissim , Impossibility of differentially private universally optimal mechanisms, in: 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, IEEE, 2010, pp. 71–80. doi:10.1109/FOCS.2010.13.

Chatzikokolakis ,

M.E.

Andrés ,

N.E.

Bordenabe and

Palamidessi , Broadening the scope of differential privacy using metrics, in: International Symposium on Privacy Enhancing Technologies Symposium, Springer, 2013, pp. 82–102. doi:10.1007/978-3-642-39077-7_5.

10.

Chatzikokolakis ,

Fernandes and

Palamidessi , Comparing systems: Max-case refinement orders and application to differential privacy, in: Proc. CSF, IEEE Press, 2019.

11.

Chatzikokolakis ,

Fernandes and

Palamidessi , Refinement orders for quantitative information flow and differential privacy, Journal of Cybersecurity and Privacy1(1) (2021), 40–77. doi:10.3390/jcp1010004.

12.

H.A.J.C.

Duchi , Near Instance-Optimality in Differential Privacy, 2020, 2020, arXiv:2005.10630v1.

13.

J.C.

Duchi ,

M.I.

Jordan and

M.J.

Wainwright , Local privacy and statistical minimax rates, in: Proceedings of the 54th Annual IEEE Symposium on Foundations of Computer Science (FOCS), IEEE Computer Society, 2013, pp. 429–438. doi:10.1109/FOCS.2013.53.

14.

Dwork, Differential privacy, in: 33rd International Colloquium on Automata, Languages and Programming (ICALP 2006),

Bugliesi,

Preneel,

Sassone and

Wegener, eds, Lecture Notes in Computer Science, Vol. 4052, Springer, 2006, pp. 1–12. ISBN 3-540-35907-9. doi:10.1007/11787006_1.

15.

Dwork ,

Mcsherry ,

Nissim and

Smith , Calibrating noise to sensitivity in private data analysis, in: Proceedings of the Third Theory of Cryptography Conference (TCC),

Halevi and

Rabin , eds, Lecture Notes in Computer Science, Vol. 3876, Springer, 2006, pp. 265–284. doi:10.1007/11681878_14.

16.

Ú.

Erlingsson ,

Pihur and

Korolova , RAPPOR: Randomized aggregatable privacy-preserving ordinal response, in: Proc. CCS, 2014, pp. 1054–1067.

17.

Fernandes ,

Dras and

McIver , Generalised differential privacy for text document processing, in: Principles of Security and Trust – 8th International Conference, POST 2019, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2019, Proceedings, Prague, Czech Republic, April 6–11, 2019, 2019, pp. 123–148. doi:10.1007/978-3-030-17138-4_6.

18.

Fernandes ,

McIver and

Morgan , The Laplace mechanism has optimal utility for differential privacy over continuous queries, in: 36th Annual ACM/IEEE Symposium on Logic in Computer Science, LICS 2021, Rome, Italy, June 29–July 2, 2021, IEEE, 2021, pp. 1–12. doi:10.1109/LICS52264.2021.9470718.

19.

Fernandes ,

McIver ,

Palamidessi and

Ding , Universal optimality and robust utility bounds for metric differential privacy, in: 35th IEEE Computer Security Foundations Symposium, CSF 2022, Haifa, Israel, August 7–10, 2022, IEEE, 2022, pp. 348–363. doi:10.1109/CSF54842.2022.9919647.

20.

Ghosh ,

Roughgarden and

Sundararajan , Universally utility-maximizing privacy mechanisms, SIAM Journal on Computing41(6) (2012), 1673–1693. doi:10.1137/09076828X.

21.

Gupte and

Sundararajan , Universally optimal privacy mechanisms for minimax agents, in: Proceedings of the Twenty-Ninth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, 2010, pp. 135–146. doi:10.1145/1807085.1807105.

22.

Kairouz ,

Oh and

Viswanath , Extremal mechanisms for local differential privacy, The Journal of Machine Learning Research17(1) (2016), 492–542.

23.

Koufogiannis ,

Han and

G.J.

Pappas , Optimality of the laplace mechanism in differential privacy, 2015, arXiv preprint arXiv:1504.00065.

24.

McIver ,

Morgan ,

Smith ,

Espinoza and

Meinicke , Abstract channels and their robust information-leakage ordering, in: Principles of Security and Trust – Third International Conference, POST 2014, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2014, Proceedings, Grenoble, France, April 5–13, 2014,

Abadi and

Kremer , eds, Lecture Notes in Computer Science, Vol. 8414, Springer, 2014, pp. 83–102. doi:10.1007/978-3-642-54792-8_5.

25.

S.T.

Rachev and

Rüschendorf , Mass Transportation Problems: Volume I: Theory, Vol. 1, Springer Science & Business Media, 1998.

26.

Weggenmann and

Kerschbaum , Differential privacy for directional data, in: Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, CCS’21, Association for Computing Machinery, New York, NY, USA, 2021, pp. 1205–1222. ISBN 9781450384544. doi:10.1145/3460120.3484734.

Universal optimality and robust utility bounds for metric differential privacy 1

Abstract

Keywords

1. Introduction

2 This functionality of the mechanism corresponds to the so-called “oblivious” model, in which the noise depends only on the result of the query and not on the original dataset.

6 When viewed as a mechanism from datasets to reported answers, as is customary in central differential privacy, d would be the Hamming distance.

1.1.1. Extensions to previous work

1.2. Related work

1.3. Organisation of the paper

2. Quantitative information flow for privacy mechanisms

2.1. Bayesian adversaries and generalised entropy

2.3. Robustness measures: Refinement and channel capacities

7 In the QIF literature these are dually described as g-vulnerabilities for gain function g.

8 It is sufficient to use posterior vulnerability to compare channels for leakage, since the prior vulnerability cancels out.

9 The data processing inequality doesn’t follow automatically; it follows here because our measures are information-theoretic.

10 We express capacities using gain functions as not all capacities can be expressed dually using loss function formulations.

Definition 3 (Universal optimality).

3.1. Privacy types as hyper-distributions

Definition 5 (Vertex Mechanism/Hyper).

Definition 6 (Kernel Mechanism/Hyper).

3.3. Kernel representations are not unique: An example

4. The metric differential privacy context for counting and sum queries

Definition 7 (Oblivious Mechanism).

Lemma 6 ([9]).

Theorem 7 ([10]).

4.1. Relationship to “counting” and “sum” queries

12 Other queries of interest, such as “average” fall under our study of “sum” queries, so that in fact the sum and counting queries cover all of the cases of interest for statistical datasets.

13 Recall that the discrete metric assigns distance 1 to all pairs of distinct elements.

5. Utility for Bayesian consumers

5.1. Classes of loss functions

Definition 9 (Monotone, strictly monotone and trivial loss functions).

Theorem 13 (Multiplicative Miracle theorem [3]).

15 Note that multiplicative capacity is defined by maximising, hence V g .

5.4. Which mechanisms have non-trivial ℓ-optimal loss functions?

Definition 10 (Utility set).

Theorem 15 (ℓ-Optimal Duality).

Theorem 19 (Universal optimality existence).

5.6. Impossibility results

Theorem 20 (Impossibility of Universally Optimal Mechanisms).

7. Universal privacy capacities

Definition 11 (Privacy type capacity).

Theorem 23 (Privacy type capacity is well defined).

8.1. Privacy type defined by ( N , d 2 )

Definition 12 (Geometric mechanism).

8.2. Privacy type defined by ( X , d D )

16 Thinking of each vertex as having either the ‘high’ or the ‘low’ value, this is the number of combinations for each choice of ‘high’ values.

Footnotes

Appendix

References

²
This functionality of the mechanism corresponds to the so-called “oblivious” model, in which the noise depends only on the result of the query and not on the original dataset.

⁶
When viewed as a mechanism from datasets to reported answers, as is customary in central differential privacy, d would be the Hamming distance.

⁷
In the QIF literature these are dually described as g-vulnerabilities for gain function g.

⁸
It is sufficient to use posterior vulnerability to compare channels for leakage, since the prior vulnerability cancels out.

⁹
The data processing inequality doesn’t follow automatically; it follows here because our measures are information-theoretic.

¹⁰
We express capacities using gain functions as not all capacities can be expressed dually using loss function formulations.

¹²
Other queries of interest, such as “average” fall under our study of “sum” queries, so that in fact the sum and counting queries cover all of the cases of interest for statistical datasets.

¹³
Recall that the discrete metric assigns distance 1 to all pairs of distinct elements.

¹⁵
Note that multiplicative capacity is defined by maximising, hence $V_{g}$ .

8.1. Privacy type defined by $(N, d_{2})$

8.2. Privacy type defined by $(X, d_{D})$

¹⁶
Thinking of each vertex as having either the ‘high’ or the ‘low’ value, this is the number of combinations for each choice of ‘high’ values.