25 September 2013

git-grep: Faster at Finding Socks Than Your Mom (and better looking)

This one is a meticulous 30 minute investment that pays off in spades the first couple of uses.

Imagine you're in one of these situations:
  • you are entering into a new codebase and aren't familiar enough with it to navigate around with ease — more of a challenge when you are using dynamically typed languages like Ruby or JavaScript where and IDE is less able to properly parse for and locate definitions of used classes/objects/methods (think of, for example, how Open Classes means that method definitions can come from, technically, anywhere). In these cases, having the ability to quickly text-search for other uses and definition is a major time-saver.
  • you're trying to make heads-or-tails of an unhelpful error message — looking for parts of that message or related concepts might give a clue as to what code is at play, at least when the error is being reported.

find | xargs grep

For years, my toolset of use, in these cases has been find | xargs grep (and for those who want that Unix feeling on Windows, check out cygwin or the like).  The syntax is relatively straight-forward:

$ find . -print0 | xargs -0 grep 'def abstract!'
Here,
  1. The "find" tool is looking for all of the files at the current working directory (i.e. ".") and all sub-directories and spewing the pathname of each to Standard Out.  The "-print0" tells find to separate each path with a null character (ASCII zero);
  2. That stream is getting piped to "xargs", which given the "-0" (dash zero) parameter tells it that parameters are null-separated (this and the "-print0" of find are working in concert, here);
  3. xargs is repeatedly invoking "grep" with as many filenames as grep can handle (but no more) in a bunch;
  4. "grep" takes each pathname, opens the file and searches for the supplied text, (here, 'def abstract!') [note: the single quotes are required here so that the shell doesn't attempt to interpret any special characters in the string, the exclamation point being one].
This is great.  Like I said, I've been using this little pipeline for a very long time.  But I have discovered for myself an even better tool for the job: git-grep.

git-grep

git-grep is an incredibly useful source code searching tool.  It's built into git, so if you've already taken the red pill, this is just more drunken boxing-fu for free.

... faster than find | xargs grep ...

In essence, git-grep does a fast search not over a set of files, but its own speedy index of file contents. So, without talking about feature sets, git-grep is much faster than find | xargs grep.  Here's a real-world example: a download of the HEAD of rails' master branch (5,720 files, 242,229 lines, 9,578,293 characters):
$ time find . -print0 | xargs -0 grep 'def abstract!'
./actionpack/lib/abstract_controller/base.rb: def abstract!
real 0m2.072s
user 0m0.079s
sys 0m0.477s
vs.
$ time git grep 'def abstract!'
actionpack/lib/abstract_controller/base.rb:33: def abstract! 
real 0m0.070s
user 0m0.035s
sys 0m0.081s
that's 2 seconds vs. 70 milliseconds.  So, faster.

... with appropriately configurable output ...

git-grep comes with a couple of features that grep simply doesn't have:
  • extensive color control — where grep allows you to specify the color for matches, alone, git-grep allows you to configure a number of elements of the output making it far easier to visually parse.
  • identify the enclosing "function" — git-grep can be configured to not just present surrounding context (i.e. X number of lines before and after your match to help provide "context" to the match), but also locate the "function" in which the line belongs. More details in a subsequent post, but there's built-in support for Ruby, Java, C#, Objective-C, PHP, Python, and a bunch of others.  (for more information, look into Git attributes.  Here's the rabbit hole.)
Here's a side-by-side visual comparison:

find | xargs grep output.  This is fine and dandy.

... which is certainly not bad: the matched line is distinguished from context and you have line numbers.  Compare that with this:

git-grep output.  I find this easier to read and like the DRY output with "function" identified

... where the filename is displayed just once at the top of the match (configurable).  and we get the added bonus of identifying the enclosing module (underlined), this is what matched as a "function" in this case.

In the second example, I've got a bash alias defined that invokes a git alias:
~/.bashrc (excerpt)
alias gg='git gp'
and the git alias contains the set of parameters I typically want:
~/.gitconfig (excerpt)
gp = grep -I --heading --before-context 2 --after-context 2 --show-function --untracked --extended-regexp
(my full configuration is in the links at the end of the article).

... one caveat ...

There's really only one "caveat" in using git-grep: you need to be within a git repo.  If you attempt to do a git-grep outside of a repository, you'll get a reasonable error message:
fatal: Not a git repository (or any of the parent directories): .git
All that's required to remedy this error message is to initialize a git repo:
$ git init
And it even though none of the files are in the git repository let alone the index, git-grep is still faster...
john@slick:development [649]$ time gg "some very unique string"
projects/jtigger/ruby/rails-master/my-file.txt
1:some very unique string
real 0m32.300s
user 0m2.073s
sys 0m6.628s
john@slick:development [650]$ time find . -print0 | xargs -0 grep "some very unique string"
./projects/jtigger/ruby/rails-master/my-file.txt:some very unique string
real 1m7.528s
user 0m4.291s
sys 0m10.395s

Other Source Searching Tools

Of course, git-grep isn't the only game in town.  Nor is it the fastest.
http://beyondgrep.com/more-tools/

References



No comments:

Post a Comment