I'm not trying to start a theological war about grep/ack here, I'm just mentioning it in case someone hasn't heard about 'ack' before and they (like me) might find it extremely useful: http://betterthangrep.com
It's grep, just better. It highlights the selected text, it shows which files, and in what line the text was found (and uses vivid colors so you can distinguish them easily), ignores .git and .hg directories (among others, that shouldn't be searched) by default, you can tell it to search, for example for only `--cpp` or `--objc` or `--ruby` or `--text` files (with a flag, not a filename pattern), and many many other neat features that I'm sure grep has, but you have to remember and memorize them. ack has sensible defaults.
Do you know of any C ports of ack? Ack is beautiful and productive, but nowhere near as fast as grep (orders of magnitude slower, in fact).
gfind . -type f -exec grep -i mbr {} \; >| /dev/null
1.10s user 0.81s system 90% cpu 2.113 total
gfind . -type f -exec ack -i mbr {} \; >| /dev/null
24.34s user 4.17s system 96% cpu 29.678 total
(Yes, I know about the flag to search recursively. This is the most fair comparison.)
You can tweak some git grep config settings and get the ack UI and because it's git grep it's got most of the code conveniences too as well as the speed.
> it shows which files, and in what line the text was found (and uses vivid colors so you can distinguish them easily),
grep -rn --color pattern ./files/
files/foo.sh:123: echo "Look at the floral pattern on this dress!"
> ignores .git and .hg directories (among others, that shouldn't be searched) by default,
git --exclude=.git --exclude=.hg --exclude=.svn
> you can tell it to search, for example for only `--cpp` or `--objc` or `--ruby` or `--text` files (with a flag, not a filename pattern),
You would use `find` in conjunction with `grep`. "Art of Unix Programming", modularity, and all that jazz. Presumably you would just modify your own grep alias or define a function to avoid retyping. The end result pretty much looks like my grep alias:
alias grep='grep -Ein --color --exclude=.git --exclude=.hg --exclude=.svn'
I still fail to see a reason to use ack, especially when I can assume grep is always available for portability.
I just replicated the test and I can confirm the FreeBSD grep compiled on Darwin is about 30x slower.
% /usr/local/bin/grep --version
/usr/local/bin/grep (GNU grep) 2.14
<snip>
% time find . -type f | xargs /usr/local/bin/grep 83ba
find . -type f 0.01s user 0.06s system 8% cpu 0.870 total
xargs /usr/local/bin/grep 83ba 0.66s user 0.31s system 95% cpu 1.017 total
% /usr/bin/grep --version
grep (BSD grep) 2.5.1-FreeBSD
% time find . -type f | xargs /usr/bin/grep 83ba
find . -type f 0.01s user 0.06s system 0% cpu 28.434 total
xargs /usr/bin/grep 83ba 31.65s user 0.40s system 99% cpu 32.113 total
There was also some discussion about this on one of the Apple mailing lists a few months ago, and it turns out there are major differences in how the two grep implementations on OS X interact with the buffer cache. In particular, empirical evidence suggests 10.6's GNU grep build caches its input, while 10.7+ BSD grep does not.
Incidentally, on OS X, you can commonly get another order of magnitude improvement over even GNU grep with Spotlight's index: use xargs to grep only through files that pass a looser mdfind "pre-screen".
I notice these Mac tools becoming a bit stale.
sort is derived GNU sort, but from some ancient version.
I guess this might be due in part to these tools now being GPLv3 ?
Almost certainly. Apple stopped updating their tools past the GPLv2 versions, with the most noticeable example being gcc, which was frozen at 4.2 until they removed it.
This may also be because the default grep, i.e. BSD grep actually pays attention to what you have set in your environment variable LANG. Default on OS X is en_US.UTF-8.
If the author were to set LANG to c. He would find that BSD grep suddenly speeds up tremendously.
Is speed really that much of a concern with grep? I typically use :vimgrep inside of vim, not because it's faster (it's orders of magnitude slower due to being interpreted vimscript), but because I hate remembering the differences between pcre/vim/gnu/posix regex syntax.
I regularly search my whole Firefox clone for keywords. If this takes 2s, that's plenty fast; if it takes 20s, I'd have to come up with some other way of doing it.
I use grep in some pipelines to bulk-process data, because if you have a fast grep, using it to pre-filter input files to remove definitely-not-matching lines is one of the quickest ways to speed up some kinds of scripts without rewriting the whole thing. And in that case, sometimes processing gigabytes+ of data, it's nice if it's fast.
One common case: I have a Perl script processing a giant file, but it only processes certain lines that match a test. You can move that test to grep, to remove nonmatching lines before Perl even hits them, which will typically be much faster than making Perl loop through them.
I once tried a sed script on a couple million text files (60 GB in total) - they were web pages downloaded in some format (WARC? I don't remember what it was called) and I needed to change the formatting slightly (to feed them to Nutch) - Mac's default sed was literally 50 times slower than gsed (on the same machine). If I remember correctly, gsed finished the task in under two hours.
just tried on snow leopard, not quite 10x but nearly 2x faster, certainly. (admittedly, by firefox checkout is mercurial, and hg locate seems to pass something invalid to xargs half way through, but I guess the first chunk of files are the same.)
Someone commented on the article that this might be caused by missing off the -F flag; I tried this, and -F makes both versions slightly faster again.
[+] [-] pooriaazimi|13 years ago|reply
It's grep, just better. It highlights the selected text, it shows which files, and in what line the text was found (and uses vivid colors so you can distinguish them easily), ignores .git and .hg directories (among others, that shouldn't be searched) by default, you can tell it to search, for example for only `--cpp` or `--objc` or `--ruby` or `--text` files (with a flag, not a filename pattern), and many many other neat features that I'm sure grep has, but you have to remember and memorize them. ack has sensible defaults.
Why ack? http://betterthangrep.com/why-ack/
manpage: http://betterthangrep.com/documentation/
Oh, and ack is written in perl and doesn't require admin privileges to install.
[+] [-] ComputerGuru|13 years ago|reply
[+] [-] caioariede|13 years ago|reply
[+] [-] ZeroGravitas|13 years ago|reply
http://travisjeffery.com/b/2012/02/search-a-git-repo-like-a-...
[+] [-] dchest|13 years ago|reply
[+] [-] wting|13 years ago|reply
You would use `find` in conjunction with `grep`. "Art of Unix Programming", modularity, and all that jazz. Presumably you would just modify your own grep alias or define a function to avoid retyping. The end result pretty much looks like my grep alias:
I still fail to see a reason to use ack, especially when I can assume grep is always available for portability.[+] [-] pixelbeat|13 years ago|reply
http://www.pixelbeat.org/scripts/findrepo
[+] [-] unknown|13 years ago|reply
[deleted]
[+] [-] chimeracoder|13 years ago|reply
To be fair, neither does GNU Grep - just do `make' (without `make install') and you're good to go.
[+] [-] Mordor|13 years ago|reply
[+] [-] martinp|13 years ago|reply
[+] [-] haberman|13 years ago|reply
[+] [-] georgebashi|13 years ago|reply
[+] [-] Mordor|13 years ago|reply
[+] [-] achille|13 years ago|reply
[+] [-] jasomill|13 years ago|reply
Incidentally, on OS X, you can commonly get another order of magnitude improvement over even GNU grep with Spotlight's index: use xargs to grep only through files that pass a looser mdfind "pre-screen".
[+] [-] unknown|13 years ago|reply
[deleted]
[+] [-] chanux|13 years ago|reply
[deleted]
[+] [-] pixelbeat|13 years ago|reply
[+] [-] paxswill|13 years ago|reply
[+] [-] saurik|13 years ago|reply
[+] [-] X-Istence|13 years ago|reply
If the author were to set LANG to c. He would find that BSD grep suddenly speeds up tremendously.
[+] [-] pdw|13 years ago|reply
[+] [-] mattparlane|13 years ago|reply
[+] [-] paxswill|13 years ago|reply
[+] [-] eik3_de|13 years ago|reply
[+] [-] emidln|13 years ago|reply
[+] [-] jlebar|13 years ago|reply
[+] [-] _delirium|13 years ago|reply
One common case: I have a Perl script processing a giant file, but it only processes certain lines that match a test. You can move that test to grep, to remove nonmatching lines before Perl even hits them, which will typically be much faster than making Perl loop through them.
Say your script.pl is doing something like:
You can replace that with:[+] [-] johncoltrane|13 years ago|reply
[+] [-] buster|13 years ago|reply
Seriously though, it's really amazing what performance they squeezed of that tool. Always amazing to grep through gigabytes of files in a few seconds.
[+] [-] pooriaazimi|13 years ago|reply
[+] [-] tehwalrus|13 years ago|reply
Someone commented on the article that this might be caused by missing off the -F flag; I tried this, and -F makes both versions slightly faster again.
[+] [-] xtrahotsauce|13 years ago|reply
[+] [-] meaty|13 years ago|reply
[+] [-] drothlis|13 years ago|reply
[+] [-] unknown|13 years ago|reply
[deleted]
[+] [-] wildranter|13 years ago|reply