Differential Privacy

[…] an algorithm is differentially private if an observer seeing its output cannot tell if a particular individual's information was used in the computation. Differential privacy is often discussed in the context of identifying individuals whose information may be in a database. Although it does not directly refer to identification and reidentification attacks, differentially private algorithms probably resist such attacks. wikipedia

chat.openai

Calibrating Noise to Sensitivity in Private Data Analysis is a paper by Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith that was published in the Journal of Privacy and Confidentiality in 2006. The paper presents a technique for protecting the privacy of individuals whose data is used to train a machine learning model. The technique is based on the idea of adding noise to the data in a way that is proportional to the "sensitivity" of the query being made. The sensitivity of a query is a measure of how much the query's output would change if the input data were slightly different. The authors propose a framework for determining the appropriate amount of noise to add to the data based on the sensitivity of the query. They show that this technique can be used to provide "differential privacy" for the individuals whose data is being used, while still allowing the machine learning model to be trained on the data. In summary, the paper presents a technique for preserving the privacy of individuals whose data is used to train a machine learning model by adding noise to the data in a way that is proportional to the sensitivity of the query being made, providing a framework for determining the appropriate amount of noise to add to the data and showing that this technique can be used to provide "differential privacy" for the individuals whose data is being used, while still allowing the machine learning model to be trained on the data.

[…]

The two papers "Calibrating Noise to Sensitivity in Private Data Analysis" by Dwork, McSherry, Nissim and Smith are the same paper, but the 2016 version is published in the Journal of Privacy and Confidentiality, whereas the 2006 version was published in the proceedings of the Third Theory of Cryptography Conference (TCC). The main difference between these two versions is the publication format and the publisher. The 2016 version is a journal paper and it's published by Journal of Privacy and Confidentiality and the 2006 version is a conference paper and it's published by Springer. The content of both versions should be the same, and the 2016 version is an updated version of the 2006 version.

DWORK, Cynthia, MCSHERRY, Frank, NISSIM, Kobbi and SMITH, Adam, 2006. Calibrating noise to sensitivity in private data analysis. In: Theory of Cryptography: Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006. Proceedings 3. Springer. 2006. p. 265–284. DWORK, Cynthia, MCSHERRY, Frank, NISSIM, Kobbi and SMITH, Adam, 2016. Calibrating noise to sensitivity in private data analysis. Journal of Privacy and Confidentiality. 2016. Vol. 7, no. 3, p. 17–51.

A preliminary version of this work in the proceedings of TCC 2006 (Dwork et al., 2006b)

We continue a line of research initiated in Dinur and Nissim (2003); Dwork and Nissim (2004); Blum et al. (2005) on privacy-preserving statistical databases.

Consider a trusted server that holds a database of sensitive information. Given a query function f mapping databases to reals, the so-called true answer is the result of applying f to the database. To protect privacy, the true answer is perturbed by the addition of random noise generated according to a carefully chosen distribution, and this response, the true answer plus noise, is returned to the user.

Previous work focused on the case of noisy sums, in which f = ∑ i g(xi), where xi denotes the ith row of the database and g maps database rows to [0, 1]. We extend the study to general functions f , proving that privacy can be preserved by calibrating the standard deviation of the noise according to the sensitivity of the function f . Roughly speaking, this is the amount that any single argument to f can change its output. The new analysis shows that for several particular applications substantially less noise is needed than was previously understood to be the case.

The first step is a very clean definition of privacy—now known as differential privacy and measure of its loss. We also provide a set of tools for designing and combining differentially private algorithms, permitting the construction of complex differentially private analytical tools from simple differentially private primitives.

Finally, we obtain separation results showing the increased value of interactive statistical release mechanisms over non-interactive ones.

chat.openai

The paper "Calibrating Noise to Sensitivity in Private Data Analysis" by Dwork, McSherry, Nissim and Smith defines privacy, or differential privacy, as a measure of the amount of information revealed about a single individual in a dataset when that individual's data is included or excluded from the dataset. The paper introduces a new measure of privacy loss called "Sensitivity" which is defined as the maximum amount of change in the output of a query function when a single individual's data is included or excluded from the dataset. They propose to add noise to the query function's output to protect privacy, and the amount of noise added is calibrated to the sensitivity of the query function to protect the privacy of individuals. The paper also presents a mathematical proof of the privacy guarantees provided by this method, and it provides a formal definition of differential privacy. They also discuss how to balance privacy and utility by choosing the appropriate level of noise to add to the query function's output.