NeuralHash Implementation and Collision
Joseph Cox et al. (Slashdot, Hacker News, Reddit):
On Wednesday, GitHub user AsuharietYgvar published details of what they claim is an implementation of NeuralHash, a hashing technology in the anti-CSAM system announced by Apple at the beginning of August. Hours later, someone else claimed to have been able to create a collision, meaning he tricked the system into giving two different images the same hash.
In a statement to Motherboard, Apple said that the version of the NeuralHash that Yvgar reverse-engineered is not the same as the final implementation that will be used with the CSAM system.
[…]
Matthew Green, who teaches cryptography at Johns Hopkins University and who has been a vocal critic of Apple’s CSAM system, told Motherboard that if collisions “exist for this function,” then he expects “they’ll exist in the system Apple eventually activates.”
“Of course, it’s possible that they will re-spin the hash function before they deploy,” he said. “But as a proof of concept, this is definitely valid,” he said of the information shared on GitHub.
“Early tests show that it can tolerate image resizing and compression, but not cropping or rotations.”
Like every other perceptual image hash. It’ll also have collisions. Keep in mind that the matching is fuzzy (you have to allow some wrong bits).
It’s not hard at all to attack such a hash to make it produce false positives.
Say I am law enforcement and I want access to your photos. I send you >30 messages with non-CSAM but colliding images. Your phone now thinks you have CSAM and grants Apple access to your data.
Then I just have to subpoena Apple for the data they already have, and I have your photos.
Meanwhile the people who actually have CSAM just have to add a frame to their images to completely neuter the system.
A lot rests on how much we can trust Apple’s human reviewers.
Also, apparently Apple’s neural network, by virtue of having 200+ (!) layers and due to floating point rounding issues, actually produces wildly different hashes on different hardware (9 bits difference between iPad and M1 Mac!). That’s... garbage. That’s 9 bits of match noise.
[…]
Actually, how does this even work at all? You have to do fuzzy matching of perceptual image hashes like NeuralHash. But they’re doing some PSI crypto stuff after that that would seem to be incompatible with it, and at no point do they talk about this.
This is not a thing. This cannot mathematically be a thing. There is no way to design a perceptual image hash to always result in the same hash when the image is altered in small ways. This is trivial to prove.
This was a bad idea from the start, and Apple never seemed to consider the adversarial context of the system as a whole, and not just the cryptography.
In a call with reporters regarding the new findings, Apple said its CSAM-scanning system had been built with collisions in mind, given the known limitations of perceptual hashing algorithms. In particular, the company emphasized a secondary server-side hashing algorithm, separate from NeuralHash, the specifics of which are not public. If an image that produced a NeuralHash collision were flagged by the system, it would be checked against the secondary system and identified as an error before reaching human moderators.
[…]
But actually generating that alert would require access to the NCMEC hash database, generating more than 30 colliding images, and then smuggling all of them onto the target’s phone.
Previously:
Update (2021-08-21): See also: Hacker News.
I’m not convinced that this secondary system was originally part of the design, since it wasn’t discussed in the original specification.
The Apple system dedupes photos, but burst shots are semantically different photos with the same subject - and an unlucky match on a burst shot could lead to multiple match events on the back end if the system isn’t implemented to defend against that.
We wrote the only peer-reviewed publication on how to build a system like Apple’s — and we concluded the technology was dangerous. We’re not concerned because we misunderstand how Apple’s system works. The problem is, we understand exactly how it works.
Brad Dwyer (via Hacker News):
In order to test things, I decided to search the publicly available ImageNet dataset for collisions between semantically different images.
[…]
There were 2 examples of actual collisions between semantically different images in the ImageNet dataset.
Update (2021-09-08): thishashcollisionisnotporn.com (via Hacker News):
Given that it’s possible to generate a false positive, it is also possible to deliberately create images that match a given hash. So, for example, someone who wants to get another person in trouble can send them innocent-looking images (like images of kittens) and manipulate those images to match a hash of known CSAM.
This site is a proof of concept for collision attacks. The images of the kittens are manipulated to match the hash of the image of the dog (59a34eabe31910abfb06f308). As a result, all images shown on this page share the same hash. When these images are both hashed with the Apple NeuralHash algorithm, they return the same hash.