The Search for a Faster CRC32
With the assistance of Linux’s perf utility, we found that most of our CPU time (~10%) was spent in one of the many Cyrus processes, in a function called
crc32()
. This function computes a checksum (using the common CRC32 algorithm) of some arbitrary chunk of data. The idea is to store the data and the checksum separately and then later, when you read the data, you recompute the checksum and compare with the original. If they’re different, then you know that either the data or the checksum have been corrupted and you can take appropriate action. Over the years, we’ve added checksums all over Cyrus, particularly in its data storage engine (known as twoskip [PDF]), and they’ve saved us more than once.At that point it became obvious - we calculate billions of CRC32 checksums, and when you add it all up, that's a lot of CPU time. So we started looking into alternative implementations, because even a small gain will translate into a big win once you run it a few billion times.