Thursday, June 23, 2016

What is Differential Privacy?

Andy Greenberg (Hacker News, MacRumors):

“We believe you should have great features and great privacy,” Federighi told the developer crowd. “Differential privacy is a research topic in the areas of statistics and data analytics that uses hashing, subsampling and noise injection to enable…crowdsourced learning while keeping the data of individual users completely private. Apple has been doing some super-important work in this area to enable differential privacy to be deployed at scale.”

Differential privacy, translated from Apple-speak, is the statistical science of trying to learn as much as possible about a group while learning as little as possible about any individual in it. With differential privacy, Apple can collect and store its users’ data in a format that lets it glean useful notions about what people do, say, like and want. But it can’t extract anything about a single, specific one of those people that might represent a privacy violation. And neither, in theory, could hackers or intelligence agencies.

Matthew Green (Hacker News):

To make a long story short, it sounds like Apple is going to be collecting a lot more data from your phone. They’re mainly doing this to make their services better, not to collect individual users’ usage habits. To guarantee this, Apple intends to apply sophisticated statistical techniques to ensure that this aggregate data -- the statistical functions it computes over all your information -- don’t leak your individual contributions. In principle this sounds pretty good. But of course, the devil is always in the details.

[…]

The total allowed leakage is often referred to as a “privacy budget”, and it determines how many queries will be allowed (and how accurate the results will be). The basic lesson of DP is that the devil is in the budget. Set it too high, and you leak your sensitive data. Set it too low, and the answers you get might not be particularly useful.

[…]

A much more promising approach is not to collect the raw data at all. This approach was recently pioneered by Google to collect usage statistics in their Chrome browser. The system, called RAPPOR, is based on an implementation of the 50-year oldrandomized response technique.

[…]

The main challenge with randomized response systems is that they can leak data if a user answers the same question multiple times.

Bruce Schneier:

What we know about anonymization is that it’s much harder than people think, and it’s likely that this technique will be full of privacy vulnerabilities. (See, for example, the excellent work of Latanya Sweeney.) As expected, security experts are skeptical.

Theoretical Anatagonist:

Microsoft Research the birth place of Differential Privacy has abandoned it.

[…]

In order to asses accuracy/utility/privacy trade-off correctly one must posses enough information about queries expected, and how much privacy budget should be allocated towards them. Keeping track of queries and budget consumed is even more difficult. Certain tasks require answers to be computed as exactly as possible. The impact from answering such queries and resulting depletion of privacy budget has never been examined in literature.

[…]

There have been few attempts at rigorously evaluating privacy budgets and parameters such as epsilon. However they have only shown Differential Privacy to be impractical.

[…]

You will encounter strong resistance and in some cases outright bullying from vocal supporters of Differential Privacy. These people will likely be completely unfamiliar with your area of application and existing practices. They will barge in into public meetings proclaiming that Differential Privacy is the only real solution. And that researchers who have spent years working with data in that field should be ignored. Even when several papers have called for further study of Differential Privacy before adoption in practice.

Rutgers:

Moritz Hardt presents a “Tutorial on Differential Privacy in the Theory Community” at the DIMACS Workshop on Recent Work on Differential Privacy Across Computer Science, held Wednesday, October 24th 2012 at the Computing Research & Education Building on Busch Campus of Rutgers University. This event was presented under the auspices on the DIMACS Special Focus on Information Sharing and Dynamic Analysis and the DIMACS Special Focus on Cybersecurity.

Update (2016-06-27): Juli Clover:

There’s been a lot of confusion about differential privacy and what it means for end users, leading Recode to write a piece that clarifies many of the details of differential privacy.

First and foremost, as with all of Apple’s data collection, there is an option to opt out of sharing data with the company. Differential data collection is entirely opt in and users can decide whether or not to send data to Apple.

1 Comment RSS · Twitter

"As expected, security experts are skeptical." -- misrepresentative title, since only one security expert (Matthew Green) is cited. In fact, Matthew Green seems to be about the only person cited in a lot of these things. There are other security experts out there. Here's at least one:

Adam Shostak: http://emergentchaos.com/archives/2016/06/the-evolution-of-apples-differential-privacy.html

"Microsoft Research the birth place of Differential Privacy has abandoned it."

Nonsense. Cynthia Dwork (at MSR) has publications dated 2016 that are still focused on differential privacy:

https://scholar.google.com/citations?hl=en&user=y2H5xmkAAAAJ&view_op=list_works&sortby=pubdate

Leave a Comment