{"id":33418,"date":"2021-08-18T16:55:54","date_gmt":"2021-08-18T20:55:54","guid":{"rendered":"https:\/\/mjtsai.com\/blog\/?p=33418"},"modified":"2021-09-08T11:23:00","modified_gmt":"2021-09-08T15:23:00","slug":"neuralhash-implementation-and-collision","status":"publish","type":"post","link":"https:\/\/mjtsai.com\/blog\/2021\/08\/18\/neuralhash-implementation-and-collision\/","title":{"rendered":"NeuralHash Implementation and Collision"},"content":{"rendered":"<p><a href=\"https:\/\/www.vice.com\/en\/article\/wx5yzq\/apple-defends-its-anti-child-abuse-imagery-tech-after-claims-of-hash-collisions\">Joseph Cox et al.<\/a> (<a href=\"https:\/\/apple.slashdot.org\/story\/21\/08\/18\/1755223\/apples-neuralhash-algorithm-has-been-reverse-engineered\">Slashdot<\/a>, <a href=\"https:\/\/news.ycombinator.com\/item?id=28218391\">Hacker<\/a> <a href=\"https:\/\/news.ycombinator.com\/item?id=28219068\">News<\/a>, <a href=\"https:\/\/www.reddit.com\/r\/MachineLearning\/comments\/p6hsoh\/p_appleneuralhash2onnx_reverseengineered_apple\/\">Reddit<\/a>):<\/p>\n<blockquote cite=\"https:\/\/www.vice.com\/en\/article\/wx5yzq\/apple-defends-its-anti-child-abuse-imagery-tech-after-claims-of-hash-collisions\"><p>On Wednesday, GitHub user AsuharietYgvar <a href=\"https:\/\/github.com\/AsuharietYgvar\/AppleNeuralHash2ONNX\">published details<\/a> of what they claim is an implementation of <a href=\"https:\/\/www.apple.com\/child-safety\/pdf\/CSAM_Detection_Technical_Summary.pdf\">NeuralHash<\/a>, a hashing technology in the anti-CSAM system <a href=\"https:\/\/techcrunch.com\/2021\/08\/05\/apple-icloud-photos-scanning\/\">announced<\/a> by Apple at the beginning of August. Hours later, someone else <a href=\"https:\/\/github.com\/AsuharietYgvar\/AppleNeuralHash2ONNX\/issues\/1#issue-973388387\">claimed to have been able to create a collision<\/a>, meaning he tricked the system into giving two different images the same hash.<\/p><\/blockquote>\n\n<p><a href=\"https:\/\/www.macrumors.com\/2021\/08\/18\/apple-explains-neuralhash-collisions-not-csam-system\/\">Juli Clover<\/a>:<\/p>\n<blockquote cite=\"https:\/\/www.macrumors.com\/2021\/08\/18\/apple-explains-neuralhash-collisions-not-csam-system\/\"><p>In a statement to <em><a href=\"https:\/\/www.vice.com\/en\/article\/wx5yzq\/apple-defends-its-anti-child-abuse-imagery-tech-after-claims-of-hash-collisions\">Motherboard<\/a><\/em>, Apple said that the version of the NeuralHash that Yvgar reverse-engineered is not the same as the final implementation that will be used with the CSAM system.<\/p><p>[&#8230;]<\/p><p>Matthew Green, who teaches cryptography at Johns Hopkins University and who has been a vocal critic of Apple&rsquo;s CSAM system, told <em>Motherboard<\/em> that if collisions &ldquo;exist for this function,&rdquo; then he expects &ldquo;they&rsquo;ll exist in the system Apple eventually activates.&rdquo;<\/p><p>&ldquo;Of course, it&rsquo;s possible that they will re-spin the hash function before they deploy,&rdquo; he said. &ldquo;But as a proof of concept, this is definitely valid,&rdquo; he said of the information shared on GitHub.<\/p><\/blockquote>\n\n<p><a href=\"https:\/\/twitter.com\/marcan42\/status\/1427896137696960513\">Hector Martin<\/a>:<\/p>\n<blockquote cite=\"https:\/\/twitter.com\/marcan42\/status\/1427896137696960513\"><p>&ldquo;Early tests show that it can tolerate image resizing and compression, but not cropping or rotations.&rdquo;<\/p><p>Like every other perceptual image hash. It&rsquo;ll also have collisions. Keep in mind that the matching is fuzzy (you have to allow some wrong bits).<\/p><p>It&rsquo;s not hard at all to attack such a hash to make it produce false positives.<\/p><p>Say I am law enforcement and I want access to your photos. I send you &gt;30 messages with non-CSAM but colliding images. Your phone now thinks you have CSAM and grants Apple access to your data.<\/p><p>Then I just have to subpoena Apple for the data they already have, and I have your photos.<\/p><p>Meanwhile the people who actually have CSAM just have to add a frame to their images to completely neuter the system.<\/p><\/blockquote>\n<p>A lot rests on how much we can trust Apple&rsquo;s human reviewers.<\/p>\n<blockquote cite=\"https:\/\/twitter.com\/marcan42\/status\/1427896137696960513\">\n<p>Also, apparently Apple&rsquo;s neural network, by virtue of having 200+ (!) layers and due to floating point rounding issues, actually produces wildly different hashes on different hardware (9 bits difference between iPad and M1 Mac!). That&rsquo;s... garbage. That&rsquo;s 9 bits of match noise.<\/p>\n<p>[&#8230;]<\/p>\n<p>Actually, how does this even work <em>at all<\/em>? You <em>have<\/em> to do fuzzy matching of perceptual image hashes like NeuralHash. But they&rsquo;re doing some PSI crypto stuff after that that would seem to be incompatible with it, and at <em>no<\/em> point do they talk about this.<\/p><p>This is not a thing. This cannot mathematically be a thing. There is no way to design a perceptual image hash to always result in the <em>same<\/em> hash when the image is altered in small ways. This is trivial to prove.<\/p><\/blockquote>\n\n<p><a href=\"https:\/\/www.schneier.com\/blog\/archives\/2021\/08\/apples-neuralhash-algorithm-has-been-reverse-engineered.html\">Bruce Schneier<\/a>:<\/p>\n<blockquote cite=\"https:\/\/www.schneier.com\/blog\/archives\/2021\/08\/apples-neuralhash-algorithm-has-been-reverse-engineered.html\"><p>This was a bad idea from the start, and Apple never seemed to consider the adversarial context of the system as a whole, and not just the cryptography.<\/p><\/blockquote>\n\n<p><a href=\"https:\/\/www.theverge.com\/2021\/8\/18\/22630439\/apple-csam-neuralhash-collision-vulnerability-flaw-cryptography\">Russell Brandom<\/a>:<\/p>\n<blockquote cite=\"https:\/\/www.theverge.com\/2021\/8\/18\/22630439\/apple-csam-neuralhash-collision-vulnerability-flaw-cryptography\"><p>In a call with reporters regarding the new findings, Apple said its CSAM-scanning system had been built with collisions in mind, given the known limitations of perceptual hashing algorithms. In particular, the company emphasized a secondary server-side hashing algorithm, separate from NeuralHash, the specifics of which are not public. If an image that produced a NeuralHash collision were flagged by the system, it would be checked against the secondary system and identified as an error before reaching human moderators.<\/p>\n<p>[&#8230;]<\/p>\n<p>But actually generating that alert would require access to the NCMEC hash database, generating more than 30 colliding images, and then smuggling all of them onto the target&rsquo;s phone. <\/p><\/blockquote>\n\n<p>Previously:<\/p>\n<ul>\n<li><a href=\"https:\/\/mjtsai.com\/blog\/2021\/08\/05\/scanning-icloud-photos-for-child-sexual-abuse\/\">Scanning iCloud Photos for Child Sexual Abuse<\/a><\/li>\n<\/ul>\n\n<p id=\"neuralhash-implementation-and-collision-update-2021-08-21\">Update (2021-08-21): See also: <a href=\"https:\/\/news.ycombinator.com\/item?id=28225706\">Hacker News<\/a>.<\/p>\n\n<p><a href=\"https:\/\/www.schneier.com\/blog\/archives\/2021\/08\/more-on-apples-iphone-backdoor.html\">Bruce Schneier<\/a>:<\/p>\n<blockquote cite=\"https:\/\/www.schneier.com\/blog\/archives\/2021\/08\/more-on-apples-iphone-backdoor.html\"><p>I&rsquo;m not convinced that this secondary system was originally part of the design, since it wasn&rsquo;t discussed in the original specification.<\/p><\/blockquote>\n\n<p><a href=\"https:\/\/twitter.com\/SarahJamieLewis\/status\/1428082934393688066\">Sarah Jamie Lewis<\/a>:<\/p>\n<blockquote cite=\"https:\/\/twitter.com\/SarahJamieLewis\/status\/1428082934393688066\"><p>The Apple system dedupes photos, but burst shots are semantically <em>different<\/em> photos with the same subject - and an unlucky match on a burst shot could lead to multiple match events on the back end if the system isn&rsquo;t implemented to defend against that.<\/p><\/blockquote>\n\n<p><a href=\"https:\/\/www.washingtonpost.com\/opinions\/2021\/08\/19\/apple-csam-abuse-encryption-security-privacy-dangerous\/\">Jonathan Mayer<\/a>:<\/p>\n<blockquote cite=\"https:\/\/www.washingtonpost.com\/opinions\/2021\/08\/19\/apple-csam-abuse-encryption-security-privacy-dangerous\/\">\n<p>We wrote the only peer-reviewed publication on how to build a system like Apple&rsquo;s &mdash; and we concluded the technology was dangerous. We&rsquo;re not concerned because we misunderstand how Apple&rsquo;s system works. The problem is, we understand exactly how it works.<\/p>\n<\/blockquote>\n\n<p><a href=\"https:\/\/blog.roboflow.com\/nerualhash-collision\/\">Brad Dwyer<\/a> (via <a href=\"https:\/\/news.ycombinator.com\/item?id=28236102\">Hacker News<\/a>):<\/p>\n<blockquote cite=\"https:\/\/blog.roboflow.com\/nerualhash-collision\/\"><p>In order to test things, I decided to search the publicly available ImageNet dataset for collisions between semantically different images.<\/p>\n<p>[&#8230;]<\/p>\n<p>There were 2 examples of actual collisions between semantically different images in the ImageNet dataset.<\/p><\/blockquote>\n\n<p id=\"neuralhash-implementation-and-collision-update-2021-09-08\">Update (2021-09-08): <a href=\"https:\/\/thishashcollisionisnotporn.com\/\">thishashcollisionisnotporn.com<\/a> (via <a href=\"https:\/\/news.ycombinator.com\/item?id=28305394\">Hacker News<\/a>):<\/p>\n<blockquote cite=\"https:\/\/thishashcollisionisnotporn.com\/\">\n<p>Given that it&rsquo;s possible to generate a false positive, it is also possible to deliberately create images that match a given hash. So, for example, someone who wants to get another person in trouble can send them innocent-looking images (like images of kittens) and manipulate those images to match a hash of known CSAM.<\/p>\n<p>This site is a proof of concept for collision attacks. The images of the kittens are manipulated to match the hash of the image of the dog (59a34eabe31910abfb06f308). As a result, all images shown on this page share the same hash. When these images are both hashed with the Apple NeuralHash algorithm, they return the same hash.<\/p>\n<\/blockquote>","protected":false},"excerpt":{"rendered":"<p>Joseph Cox et al. (Slashdot, Hacker News, Reddit): On Wednesday, GitHub user AsuharietYgvar published details of what they claim is an implementation of NeuralHash, a hashing technology in the anti-CSAM system announced by Apple at the beginning of August. Hours later, someone else claimed to have been able to create a collision, meaning he tricked [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"apple_news_api_created_at":"2021-08-18T20:55:58Z","apple_news_api_id":"0f1707d4-e9dd-44ab-a691-780642a9f9d2","apple_news_api_modified_at":"2021-09-08T15:23:04Z","apple_news_api_revision":"AAAAAAAAAAAAAAAAAAAACA==","apple_news_api_share_url":"https:\/\/apple.news\/ADxcH1OndRKumkXgGQqn50g","apple_news_coverimage":0,"apple_news_coverimage_caption":"","apple_news_is_hidden":false,"apple_news_is_paid":false,"apple_news_is_preview":false,"apple_news_is_sponsored":false,"apple_news_maturity_rating":"","apple_news_metadata":"\"\"","apple_news_pullquote":"","apple_news_pullquote_position":"","apple_news_slug":"","apple_news_sections":"\"\"","apple_news_suppress_video_url":false,"apple_news_use_image_component":false,"footnotes":""},"categories":[2],"tags":[289,1351,2106,619,31,2078,30,2077,74,355,71],"class_list":["post-33418","post","type-post","status-publish","format-standard","hentry","category-technology","tag-algorithm","tag-artificial-intelligence","tag-child-sexual-abuse-material-csam","tag-graphics","tag-ios","tag-ios-15","tag-mac","tag-macos-12","tag-opensource","tag-privacy","tag-programming"],"apple_news_notices":[],"_links":{"self":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/33418","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/comments?post=33418"}],"version-history":[{"count":8,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/33418\/revisions"}],"predecessor-version":[{"id":33555,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/33418\/revisions\/33555"}],"wp:attachment":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/media?parent=33418"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/categories?post=33418"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/tags?post=33418"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}