# PyPy’s Hash Table Implementation

Maciej Fijalkowski (via David Smith):

One surprising part is that the new design, besides being more memory efficient, is ordered by design: it preserves the insertion order.

[…]

Here,

`compact_array`

stores all the items in order of insertion, while`sparse_array`

is a 1/2 to 2/3 full array of integers. The integers themselves are of the smallest size necessary for indexing the`compact_array`

. So if`compact_array`

has less than 256 items, then`sparse_array`

will be made of bytes; if less than 2^16, it’ll be two-byte integers; and so on.This design saves quite a bit of memory.

[…]

The obvious benefit of having more compact dictionaries is an increased cache friendliness. In modern CPUs cache misses are much more costly than doing additional simple work, like having an additional level of (in-cache) indirection.

[…]

To preserve order, when we delete an entry, we mark the entry as removed but don’t otherwise shuffle the remaining entries. If we repeat this operation often enough, there will be a lot of removed entries in the (originally compact) array. At this point, we need to do a “packing” operation, which moves all live entries to the start of the array (and then reindexes the sparse array, as the positions changed).

Previously: Accidentally Quadratic Rust Hash Tables, Exploring Swift Dictionary’s Implementation, Exposing NSDictionary.

## 1 Comment

Stay up-to-date by subscribing to the Comments RSS Feed for this post.

[…] Previously: PyPy’s Hash Table Implementation. […]