Wednesday, June 7, 2006

Bentley Overflow Bug

Joshua Bloch, whose departure can’t have been good for Sun, reports that most divide-and-conquer algorithms—including the binary search in Jon Bentley’s excellent Programming Pearls—are broken because of integer overflow bugs (via Tim Bray):

We programmers need all the help we can get, and we should never assume otherwise. Careful design is great. Testing is great. Formal methods are great. Code reviews are great. Static analysis is great. But none of these things alone are sufficient to eliminate bugs: They will always be with us. A bug can exist for half a century despite our best efforts to exterminate it. We must program carefully, defensively, and remain ever vigilant.

Good advice, but since all of the above failed in some of the simplest and most widely distributed code ever written, what hope do we have? Well, when static analysis is useless and humans can’t be trusted, it sounds like a job for the language. As the example illustrates, Java already does this with array bounds checking, and plenty of languages (even some compiled ones) automatically promote integers to avoid overflow problems.

3 Comments RSS · Twitter

What is it I don't understand about Java bitwise operators? Bloch says that this is an acceptable fix for the bug:

The bug is in this line: 6: int mid =(low + high) / 2;… So what's the best way to fix the bug? Here's one way: 6: int mid = low + ((high - low) / 2);Probably faster, and arguably as clear is: 6: int mid = (low + high) >>> 1;In C and C++ (where you don't have the >>> operator), you can do this: 6: mid = ((unsigned) (low + high)) >> 1;

Both of the last two solutions still add "low" and "high" together, so they could still overflow. The last one assumes that casting them to unsigned is enough, and it is unless someone already made "low" and "high" unsigned for other reasons.

The middle one just seems to assume that right-shifting (low + high) to the right by one bit with zero fill works, but that's not immediately obvious to me. Hmm…I guess that as long as the overflow can't be more than one bit, and adding two values together it can't be more than one bit, then right-shifting while clearing the high bit is exactly the right thing to do. It's just clearing the sign bit on the result, basically.

But that presumes that (low + high) does the right bitwise arithmetic and doesn't do any fancy formatting. That is, in 16-bit land, that adding 65535 and 65535 gives 0xFFFE with an overflow bit. If that's not *defined*, then the last solution won't work in all cases either. If it is defined to behave that way, it seems rather clever.

Matt,

In the second solution, there is overflow, but it's predictable. The sum can only require one extra bit, and the specification says that the low-order bits will be correct and that the sign bit will be flipped. It's using the sign bit to store the overflow, essentially.

I think you're right that things wouldn't be so simple in the second two solutions if "low" and "high" were unsigned to begin with. In that case, I think it's not clear what the high bit should be after adding and shifting because you don't know whether there was overflow.

The folks at Lambda weigh in.

Leave a Comment