Thursday, June 17, 2010

Interpolation Search

Jay from LinkedIn’s SNA team describes interpolation search, which is like binary search except that it looks at the values of the elements so that it can do better than guessing the middle each time (via Alan Odgaard):

The data structure for these uses a large sorted index file to do lookups, what is stored in this file is an MD5 of the key. Since the MD5s are used for the sort, the file is guaranteed to be uniformly distributed over the key space, and can often be many GBs in size. These files are memory mapped to help reduce the cost of a read, but the improved search algorithm can help to greatly reduce the number of seeks when the index is not fully memory resident. A disk seek comes at a price of around 10ms, so saving even one or two is a huge performance win.

Comments RSS · Twitter

Leave a Comment