The growing availability of personal genomics services comes with increasing concerns for genomic privacy. Individuals may wish to withhold sensitive genotypes that contain critical health-related information when sharing their data with such services. A straightforward solution that masks only the sensitive genotypes does not ensure privacy due to the correlation structure within the genome. Here, we develop an information-theoretic mechanism for masking sensitive genotypes, which ensures no information about the sensitive genotypes is leaked. We also propose an efficient algorithmic implementation of our mechanism for genomic data governed by hidden Markov models. Our work is a step towards more rigorous control of privacy in genomic data sharing.
Local differential privacy (LDP) enables private data sharing and analytics without the need for a trusted data collector. Error-optimal primitives (for, e.g., estimating means and item frequencies) under LDP have been well studied. For analytical tasks such as range queries, however, the best known error bound is dependent on the domain size of private data, which is potentially prohibitive. This deficiency is inherent as LDP protects the same level of indistinguishability between any pair of private data values for each data downer. In this paper, we utilize an extension of eps-LDP called Metric-LDP or E-LDP, where a metric E defines heterogeneous privacy guarantees for different pairs of private data values and thus provides a more flexible knob than eps does to relax LDP and tune utility-privacy trade-offs. We show that, under such privacy relaxations, for analytical workloads such as linear counting, multi-dimensional range counting queries, and quantile queries, we can achieve significant gains in utility. In particular, for range queries under E-LDP where the metric E is the L1-distance function scaled by eps, we design mechanisms with errors independent on the domain sizes; instead, their errors depend on the metric E, which specifies in what granularity the private data is protected. We believe that the primitives we design for E-LDP will be useful in developing mechanisms for other analytical tasks, and encourage the adoption of LDP in practice.
In this paper, we study an estimation problem under the local differential privacy (LDP) framework: There is an ordered list of $d$ values (e.g., real numbers); a set of $n$ users, where each user observes an element from this list and each value in the list is observed by at least one user; and an untrusted server, who wants to estimate the values that the users possess, without learning (in the sense of LDP) the actual value that each user has and its corresponding index in the list. Towards this, we propose two LDP estimation schemes: The first one is under the assumption that the server knows the number of users that observe each value; and the second one is for the general scenario, in which the server does not have this prior information. We show that the minimax risk decreases with the total number of users under a very mild condition on the number of users observing each value.
We derive the optimal differential privacy (DP) parameters of a mechanism that satisfies a given level of Renyi differential privacy (RDP). Our result is based on the joint range of two f-divergences that underlie the approximate and the Renyi variations of differential privacy. We apply our result to the moments accountant framework for characterizing privacy guarantees of stochastic gradient descent. When compared to the state-of-the-art, our bounds may lead to about 100 more stochastic gradient descent iterations for training deep learning models for the same privacy budget.
This paper studies the tradeoff in privacy and utility in a single-trial multi-terminal guessing (estimation) framework using a system model that is inspired by index coding. There are n independent discrete sources at a data curator. There are m legitimate users and one adversary, each with some side information about the sources. The data curator broadcasts a distorted function of sources to legitimate users, which is also overheard by the adversary. In terms of utility, each legitimate user wishes to perfectly reconstruct some of the unknown sources and attain a certain gain in the estimation correctness for the remaining unknown sources. In terms of privacy, the data curator wishes to minimize the maximal leakage: the worst-case guessing gain of the adversary in estimating any target function of its unknown sources after receiving the broadcast data. Given the system settings, we derive fundamental performance lower bounds on the maximal leakage to the adversary, which are inspired by the notion of confusion graph and performance bounds for the index coding problem. We also detail a greedy privacy enhancing mechanism, which is inspired by the agglomerative clustering algorithms in the information bottleneck and privacy funnel problems.