Friday, January 10, 2014

Why GNU grep Is Fast

Mike Haertel:

Moreover, GNU grep AVOIDS BREAKING THE INPUT INTO LINES. Looking for newlines would slow grep down by a factor of several times, because to find the newlines it would have to look at every byte!

So instead of using line-oriented input, GNU grep reads raw data into a large buffer, searches the buffer using Boyer-Moore, and only when it finds a match does it go and look for the bounding newlines. (Certain command line options like -n disable this optimization.)

1 Comment

Nice post. I knew the basics, but this post actually made me look up and read wikipedia on Boyer-Moore, very interesting.

Stay up-to-date by subscribing to the Comments RSS Feed for this post.

Leave a Comment