Saturday, March 19, 2016

C Undefined Behavior in SQLite

John Regehr (Hacker News):

SQLite likes to use — but not dereference — pointers to heap blocks that have been freed. It did this at quite a few locations.

[…]

At least one uninitialized read that we found was potentially harmful, though we couldn’t make it behave unpredictably.

[…]

SQLite’s vdbe struct has a member called aMem that uses 1-based array indexing. To avoid wasting an element, this array is initialized like this[…]

[…]

SQLite had a place where it called memset() with an invalid pointer and another calling memcpy() with a null pointer. In both cases the length argument was zero, so the calls were otherwise harmless.

Nathan Kurz:

One might wonder why they didn’t just cast the return value to (void), which is a traditional and clearer way of signifying that the return value is intentionally being ignored. It’s because the GCC maintainers don’t believe that the end user should be allowed to do so, and don’t really care what other tools do or have done[…]

Richard Hipp:

Prof. Regehr did not find problems with SQLite. He found constructs in the SQLite source code which under a strict reading of the C standards have “undefined behaviour”, which means that the compiler can generate whatever machine code it wants without it being called a compiler bug. That’s an important finding. But as it happens, no modern compilers that we know of actually interpret any of the SQLite source code in an unexpected or harmful way. We know this, because we have tested the SQLite machine code – every single instruction – using many different compilers, on many different CPU architectures and operating systems and with many different compile-time options. So there is nothing wrong with the sqlite3.so or sqlite3.dylib or winsqlite3.dll library that is happily running on your computer. Those files contain no source code, and hence no UB.

The point of Prof. Regehr’s post (as I understand it) is the the C programming language as evolved to contain such byzantine rules that even experts find it difficult to write complex programs that do not contain UB.

John Regehr:

Richard, I think I can characterize your position as something like “If the code compiles and works today, then by definition it’s not buggy.” I think you should recognize that this is a somewhat extreme position, or at least pretty far towards one end of a spectrum.

Richard Hipp:

John, my views are distorted by 6 years of relentless focus on MC/DC. In that context, UB is like compiler warnings or compiler bugs – issues that should be dealt with but which are not existential threats to the project since they all occur upstream from the point of verification.

When working on other (normal) projects (example: Fossil) where the point of verification is the logical correctness of the source code, then I completely agree that UB should be religiously avoided, since it occurs downstream from the verification point.

Comments RSS · Twitter

Leave a Comment