Thursday, February 23, 2017

SHA-1 Collision

Google (Hacker News):

Today, 10 years after of SHA-1 was first introduced, we are announcing the first practical technique for generating a collision. This represents the culmination of two years of research that sprung from a collaboration between the CWI Institute in Amsterdam and Google. We’ve summarized how we went about generating a collision below. As a proof of the attack, we are releasing two PDFs that have identical SHA-1 hashes but different content.

For the tech community, our findings emphasize the necessity of sunsetting SHA-1 usage. Google has advocated the deprecation of SHA-1 for many years, particularly when it comes to signing TLS certificates. As early as 2014, the Chrome team announced that they would gradually phase out using SHA-1. We hope our practical attack on SHA-1 will cement that the protocol should no longer be considered secure.


This attack required over 9,223,372,036,854,775,808 SHA1 computations. This took the equivalent processing power as 6,500 years of single-CPU computations and 110 years of single-GPU computations.


The SHAttered attack is 100,000 faster than the brute force attack that relies on the birthday paradox. The brute force attack would require 12,000,000 GPU years to complete, and it is therefore impractical.


Basically, each PDF contains a single large (421,385-byte) JPG image, followed by a few PDF commands to display the JPG. The collision lives entirely in the JPG data - the PDF format is merely incidental here. Extracting out the two images shows two JPG files with different contents (but different SHA-1 hashes since the necessary prefix is missing). Each PDF consists of a common prefix (which contains the PDF header, JPG stream descriptor and some JPG headers), and a common suffix (containing image data and PDF display commands).

The header of each JPG contains a comment field, aligned such that the 16-bit length value of the field lies in the collision zone. Thus, when the collision is generated, one of the PDFs will have a longer comment field than the other. After that, they concatenate two complete JPG image streams with different image content - File 1 sees the first image stream and File 2 sees the second image stream. This is achieved by using misalignment of the comment fields to cause the first image stream to appear as a comment in File 2 (more specifically, as a sequence of comments, in order to avoid overflowing the 16-bit comment length field). Since JPGs terminate at the end-of-file (FFD9) marker, the second image stream isn’t even examined in File 1 (whereas that marker is just inside a comment in File 2).

I think SHAttered overstates the impact on Git. Linus Torvalds (2005, via Joe Groff):

I really hate theoretical discussions.

The fact is, a lot of crap engineering gets done because of the question ”what if?”. It results in over-engineering, often to the point where the end result is quite a lot measurably worse than the sane results.

You are literally arguing for the equivalent of “what if a meteorite hit my plane while it was in flight - maybe I should add three inches of high-tension armored steel around the plane, so that my passengers would be protected”.


And the thing is, if somebody finds a way to make sha1 act as just a complex parity bit, and comes up with generating a clashing object that actually makes sense, then going to sha256 is likely pointless too - I think the algorithm is basically the same, just with more bits. If you’ve broken sha1 to the point where it’s that breakable, then you’ve likely broken sha256 too.

He’s being criticized for saying this, but (so far) it looks like he was actually right.

Linus Torvalds (Hacker News):

Put another way: I doubt the sky is falling for git as a sourcecontrol management tool. Do we want to migrate to another hash? Yes. Is it “game over” for SHA1 like people want to say? Probably not.

I haven’t seen the attack details, but I bet

(a) the fact that we have a separate size encoding makes it much harder to do on git objects in the first place

(b) we can probably easily add some extra sanity checks to the opaque data we do have, to make it much harder to do the hiding of random data that these attacks pretty much always depend on.

Previously: MD5 Collision.

Update (2017-02-24): See also: Subversion (ArsTechnica), Mercurial, Bruce Schneier in 2005 and now.

Update (2017-03-09): See also: Linus Torvalds (Hacker News), Jon Gilmore (via Zaki Manian), Matthew Green.

Update (2017-03-16): See also: Linus Torvalds (via Reddit).

Update (2019-05-17): Thomas Peyrin:

Our paper on chosen-prefix collision attack for SHA-1 is out. TL;DR: computing such collision is very practical, for a reasonable cost. More results coming soon. Remove SHA-1 now if you still implement it for any digital signature/certificate use.

Comments RSS · Twitter

Leave a Comment