Thursday, February 5, 2009

Ordered Hashes in Ruby 1.9

In Ruby 1.9, hash tables preserve insertion order by maintaining a doubly-linked list. This seems like a dubious feature. It only accords with the principle of least surprise if you don’t know how a hash table works. I don’t think I’ve ever needed a hash to do this, and I worry that making the order mostly predictable will lead to fragile code that accidentally relies on the ordering where it shouldn’t. All of this comes at the cost of increased memory use. Overall, it’s faster than Ruby’s previous hash implementation, but presumably it would be even faster without the ordering feature.

6 Comments RSS · Twitter

I don't know; pushing a new item onto the end of a linked list happens in constant time, so while it is more work, I can't see that it would kill scalability.

Memory usage increases too, but really, an extra 16 bytes per item? At the point where it starts to become an issue, you're going to have bigger problems than this.

Behavior-wise, I see no real reason why this should violate any expectations. If you're expecting a mostly-ordered result, and you get a perfectly-ordered result, you probably won't even care.

I agree that there should be a way to turn it off, if you profile your code and determine that it *is* slowing things down.

Joel: Percentagewise, it's a lot of memory overhead. What I meant about the behavior is that you may accidentally get what you were expecting with test data, due to the way your test data was generated. So you get results that appear to be in sorted order, or identical lists of keys from separate hashes. But, to actually be correct, maybe your code should be doing a sort or should be comparing sets rather than lists.

I agree with you completely.

I'm a little surprised that you've never needed this, though. This sort of 'feature' seems extremely useful. There have been many times when I've needed an ordered set of items and then wanted constant access to the items in the list (via a key) as well.

However, it probably makes more sense to to implement it as a different data structure and maybe not part of the core language.

Whitney Young: I’ve sometimes needed ordered mappings, but not ordered by insertion order.

I've needed this a few times, and not having to keep a separate list for order saves programmer time and avoids bugs. I would also guess that this is how programmers who don't know how hash tables work actually expect them to work.

So I think it's a fair trade-off.

Leave a Comment