top | item 3084195

Introduction to sed

125 points| pkrumins | 14 years ago |catonmat.net | reply

31 comments

order
[+] gjm11|14 years ago|reply
Flagged for excessive self-promotion. Take out "World's best" or make the article provide strong evidence that it's the world's best beyond the fact that its author thinks it is, and I'll unflag it.

If this didn't have "World's best" in the blog title and someone had added that to the HN title, a moderator would be deleting those words. The fact that you've inserted the same peacock words in your own blog title as well as your HN submission does not make this any better.

(I don't like being so blunt with a highly regarded veteran HN user, but abuse is abuse whoever does it. Sorry, Peteris.)

[EDITED to add: The HN guidelines say not to comment saying "I flagged this". I think this is an unusual case. I appreciate that others may disagree.]

[+] pkrumins|14 years ago|reply
Thanks for letting me know. I just removed the "World's best." I just thought I'd name it "World's best" cause I was feeling fantastic today. :)
[+] nabb|14 years ago|reply
The introduction was alright, but I'm not sure that the hold buffer needs to be introduced straight away. Using the hold buffer can get complicated pretty quickly, and it isn't something that you'll frequently have to do. The example the author gives of getting lines matching a regex with the previous line is more easily done using grep -B1, which doesn't require the mental overhead of sed scripting.

Perhaps more useful for an introduction would be retaining portions of the input back-references (\1 to \9) and `&'. Also useful is is the `g' flag for the `s' command, for global replacement, and the `i' flag (a GNU extension) for case insensitivity. Another note is that you can actually use any character to delimit `s' commands you want, which is useful if you have /'s in your pattern or replacement strings. The last main part of sed which I use regularly is the -r option, for extended regular expressions (basically the same as grep), which lets you use regular expression tokens like + and | without escaping. On that note, the author misses this point in his introduction, and the example of "sed -n '/a+b+/p'" should actually be either "sed -rn '/a+b+/p'" or "sed -n '/a\+b\+/p'".

For a serious introduction to all the extra functionality in sed, there's a great reference at the top google result for sed[1]. In addition, GNU sed adds functionality that you might be useful at some point (e.g. s///e), all of which are described in the GNU sed user's manual[2].

[1] http://www.grymoire.com/Unix/Sed.html [2] http://www.gnu.org/software/sed/manual/index.html

[+] pkrumins|14 years ago|reply
Thank you, this is a really great comment! I'll improve my introduction based on your feedback to make it better.
[+] hsmyers|14 years ago|reply
Are there any HN reviews of this guys eBooks? I think he has two on SED and I'm a little leery of self promotion although the preview was pretty good.
[+] bradleyland|14 years ago|reply
I've got both one-liners books (Sed and Awk). Peteris takes a very practical approach in the book. I almost think of it like a field guide. Think of the times you've google searched for something like 'number lines in file sed'. These books not only provide solutions, but give detailed explanations of what's happening. Pick any given example and there's a good chance you'll gain some insight about how sed/awk work. They're very well written, and very accessible.

Having said that, if you're a long beard who has long been acquainted with these tools, you could maybe skip them.

[+] pkrumins|14 years ago|reply
[+] dbbo|14 years ago|reply
To quote "The Awful Truth about sed" section of the Grymoire, "It is not your fault you don't understand sed."

I do, however, understand Perl, so that's what I use. E.g.: `echo 'fubar' | perl -lpe 's/fu/foo/'`

It might be a few milliseconds slower-- for example, commenting every line of my 352-line .zshrc (via s/^/#/) takes 0.007s total with Perl (so does `s/^/#/ unless /^\s*#/`) and 0.005s with sed. Commenting out every line of /usr/share/dict/american-english (98569 lines) takes 1.124s with Perl and 0.813s with sed.

Since I already know how to do more complicated things with Perl (like conditionals, named backreferences, etc.) it doesn't seem worth it to take the time to learn how to use sed effectively. I can wait the extra second since I'm not on any kind of deadline or under any efficiency constraints.

I am not trying to say that Perl is better than sed or any other text processing tool. I also don't mean to imply that speed is sed's only advantage-- it's just one example. I think that for someone who already knows some Perl, learning another similar tool doesn't make sense. I'm sure there are exceptions. This is only my humble, personal opinion.

For people who do need/want to learn sed, the article did a pretty good job of showing you how to get a lot done without a whole lot of reading.

[+] _delirium|14 years ago|reply
On the speed question, I actually find Perl considerably faster than sed in a lot of use-cases, if there's enough processing to dominate the slightly higher startup costs of Perl.

For example, at one point I had reason to take a gigantic single-line textfile, and break it into lines based on a specific 3-letter pattern that didn't occur anywhere else:

   s/ABC/A\nC/g
In whatever sed comes with Debian, this took about 10 minutes, CPU-bound, for a 2-gigabyte file. With Perl: 1.5 minutes, IO-bound. Not too sure why. Maybe sed runs everything through the regex engine, while Perl special-cases constant strings? Perhaps Perl has better buffer management for processing gigabytes of text? I haven't done any real testing.
[+] there|14 years ago|reply
however, perl has -i, which not all seds do, so those 0.002 seconds you lost to perl will be more than gained by being able to type

      perl -pi -e 's/foo/bar/' somefile
instead of

      sed 's/foo/bar/' somefile > somefile.tmp && mv somefile.tmp somefile
[+] hugh3|14 years ago|reply
I know how to do exactly one thing in sed, and that's sed 's/blah/blahprime/' somefile.

If sed has capabilities other than that, I don't particularly care... but that's one thing that I need to do frequently which is more painful in awk or python.

[+] dbbo|14 years ago|reply
Apparently the Debian kernel team agrees. I just noticed this while building a kernel package:

test -n "$k" || perl -pli~ -e 's/\$\{shlibs:Depends\}\,?//g' debian/control

[+] freemarketteddy|14 years ago|reply
I dont know if this off topic but is there an utlity that can parse C files and print all the function names or local variable names for example.

I know it can be done through sed,but a perfect regex for parsing functions and nothing else can be complex and I am wondering if someone has already done it.

[+] grok2|14 years ago|reply
Look at what the perl script at http://www.gson.org/egypt/ does -- if you are using gcc, you can compile with certain options to generate info from the source code into an intermediate file and then parse this intermediate file with a script to get the info you need. A bit complex, but useful if you are trying to read through code you are maintaining. A compiler is better than a regex for this, I think, for C.

Also ctags/etags like another comment mentioned as also the cscope utility.

[+] cenuij|14 years ago|reply
I'll just leave this here...

    s/&/&/g
Sorry! :D