Tuesday, June 26, 2018 [Tweets] [Favorites]

Fibonacci Hashing

Malte Skarupke (via Hacker News):

Everyone uses the algorithm that’s unnecessarily slow and leads to more problems, and nobody is using the algorithm that’s faster while at the same time being more robust to problematic patterns. Knuth talked about Integer Modulo and about Fibonacci Hashing, and everybody should have taken away from that that they should use Fibonacci Hashing, but they didn’t and everybody uses integer modulo.

[…]

So what’s happening here is that Knuth uses the term “hash function” differently than we use it today.  Today the steps in a hash table are something like this:

  1. Hash the key
  2. Map the hash value to a slot
  3. Compare the item in the slot
  4. If it’s not the right item, repeat step 3 with a different item until the right one is found or some end condition is met

We use the term “hash function” to refer to step 1. But Knuth uses the term “hash function” to refer to something that does both step 1 and step 2. So when he refers to a hash function, he means something that both hashes the incoming key, and assigns it to a slot in the table. So if the table is only 1024 items large, the hash function can only return a value from 0 to 1023. This explains why “integer modulo” is a hash function for Knuth: It doesn’t do anything in step 1, but it does work well for step 2. So if those two steps were just one step, then integer modulo does a good job at that one step since it does a good job at our step 2. But when we take it apart like that, we’ll see that Fibonacci Hashing is an improvement compared to integer modulo in both steps. And since we’re only using it for step 2, it allows us to use a faster implementation for step 1 because the hash function gets some help from the additional mixing that Fibonacci hashing does.

But this difference in terms, where Knuth uses “hash function” to mean something different than “hash function” means for std::unordered_map, explains to me why nobody is using Fibonacci hashing. When judged as a “hash function” in today’s terms, it’s not that great.

Comments

Stay up-to-date by subscribing to the Comments RSS Feed for this post.

Leave a Comment