The problem of secret-key based authentication under privacy and storage constraints on the source sequence is considered. The identifier measurement channels during authentication are assumed to be controllable via a cost-constrained action sequence. Single-letter inner and outer bounds for the key-leakage-storage-cost regions are derived for a generalization of a classic two-terminal key agreement model with an eavesdropper that observes a sequence that is correlated with the sequences observed by the legitimate terminals. The additions to the model are that the encoder observes a noisy version of a remote source, and the noisy output and the remote source output together with an action sequence are given as inputs to the measurement channel at the decoder. Thus, correlation is introduced between the noise components on the encoder and decoder measurements. The model with a secret key generated by an encoder is extended to the randomized models, where a secret-key is embedded to the encoder. The results are relevant for several user and device authentication scenarios including physical and biometric identifiers with multiple measurements that provide diversity and multiplexing gains. To illustrate the behavior of the rate region, achievable (secret-key rate, storage-rate, cost) tuples are given for binary identifiers and measurement channels that can be represented as a mixture of binary symmetric subchannels. The gains from using an action sequence such as a large secret-key rate at a significantly small hardware cost, are illustrated to motivate the use of low-complexity transform-coding algorithms with cost-constrained actions.
With the explosive development of big data, it is necessary to sort the data according to their importance or priorities. The sources with different importance levels can be modeled by the multilevel diversity coding systems (MDCS). Another trend in future communication networks, say 5G wireless networks and Internet of Things, is that users may obtain their data from all available sources, even from devices belonging to other users. Then, the privacy of data becomes a crucial issue. In a recent work by Li \etal, the secure asymmetric MDCS (S-AMDCS) with wiretap channels was investigated, where the wiretapped messages do not leak any information about the sources (\ie~ perfect secrecy). It was shown that superposition (source-separate coding) is not optimal for the general S-AMDCS and the exact full secure rate region was proved for a class of S-AMDCS. In addition, a bound on the key size of the secure rate region was provided as well. As a further step on the S-AMDCS problem, this paper mainly focuses on the key size characterization. Specifically, the constraints on the key size of superposition secure rate region are proved and a counterexample is found to show that the bound on the key size of the exact secure rate region provided by Li \etal~ is not tight. In contrast, tight necessary and sufficient constraints on the secrecy key size of the counterexample, which is the four-encoder S-AMDCS, are proved.
Suppose we are given a large number of sequences on a given alphabet, and an adversary is interested in identifying (de-anonymizing) a specific target sequence based on its patterns. Our goal is to thwart such an adversary by obfuscating the target sequences by applying artificial (but small) distortions to its values. A key point here is that we would like to make no assumptions about the statistical model of such sequences. This is in contrast to existing literature where assumptions (e.g., Markov chains) are made regarding such sequences to obtain privacy guarantees. We relate this problem to a set of combinatorial questions on sequence construction based on which we are able to obtain provable guarantees. This problem is relevant to important privacy applications: from fingerprinting webpages visited by users through anonymous communication systems to linking communicating parties on messaging applications to inferring activities of users of IoT devices.
A user generates $n$ independent and identically distributed data random variables with a probability mass function that must be guarded from a querier. The querier must recover, with a prescribed accuracy, a given function of the data from each of $n$ independent and identically distributed user-devised query responses. The user chooses the data pmf and the random query responses to maximize distribution privacy as gauged by the divergence between the pmf and the querier’s best estimate of it based on the $n$ query responses. A general lower bound is provided for distribution privacy; and, for the case of binary-valued functions, upper and lower bounds that converge to said bound as $n$ grows. Explicit strategies for the user and querier are identified.
We investigate the framework of privacy amplification by iteration, recently proposed by Feldman et al., from an information-theoretic lens. We demonstrate that differential privacy guarantees of iterative mappings can be determined by a direct application of contraction coefficients derived from strong data processing inequalities for f-divergences. In particular, by generalizing the Dobrushin's contraction coefficient for total variation distance to an f-divergence known as E_\gamma-divergence, we derive tighter bounds on the differential privacy parameters of the projected noisy stochastic gradient descent algorithm with hidden intermediate updates.