Why Grep Is Fast

Great breakdown of why GNU grep is fast – straight from the original author – Link

Summary:

  1. Use Boyer-Moore (and unroll its inner loop a few times).
  2. Roll your own unbuffered input using raw system calls. Avoid copying
    the input bytes before searching them. (Do, however, use buffered
    *output*. The normal grep scenario is that the amount of output is
    small compared to the amount of input, so the overhead of output
    buffer copying is small, while savings due to avoiding many small
    unbuffered writes can be large.)
  3. Don’t look for newlines in the input until after you’ve found a match.
  4. Try to set things up (page-aligned buffers, page-sized read chunks,
    optionally use mmap) so the kernel can ALSO avoid copying the bytes.
Advertisements

~ by slyghost on April 17, 2011.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: