top | item 45931107

(no title)

Awk is still one of my favorite tools because its power is underestimated by nearly everyone I see using it.

    ls -l | awk '{print $3}'

That’s typical usage of Awk, where you use it in place of cut because you can’t be bothered to remember the right flags for cut.

But… Awk, by itself, can often replace entire pipelines. Reduce your pipeline to a single Awk invocation! The only drawback is that very few people know Awk well enough to do this, and this means that if you write non-trivial Awk code, nobody on your team will be able to read it.

Every once in a while, I write some tool in Awk or figure out how to rewrite some pipeline as Awk. It’s an enrichment activity for me, like those toys they put in animal habitats at the zoo.

discuss

PopAlongKid|3 months ago

>To Perl connoisseurs, this feature may be known as Autovivification. In general, AWK is quite unequivocally a prototype of Perl. You can even say that Perl is a kind of AWK overgrowth on steroids…

Before I learned Perl, I used to write non-trivial awk programs. Associative arrays, and other features are indeed very powerful. I'm no longer fluent, but I think I could still read a sophisticated awk script.

Even sed can be used for some fancy processing (i.e scripts), if one knows regex well.

nerdponx|3 months ago

> this means that if you write non-trivial Awk code, nobody on your team will be able to read it.

Sort of! A lot of AWK is easy to read even if you don't remember how to write it. There are a few quirks like how gsub modifies its target in-place (and how its default target is $0), and of course understanding the overall pattern-action layout. But I think most reasonable (not too clever, not too complicated) AWK scripts would also be readable to a typical programmer even if they don't know AWK specifically.

Brian_K_White|3 months ago

I wrote a BASIC renumberer and compactor in bash, using every bashism I could so that it called no externals and didn't even use backticks to call child bashes, just pure bash itself (but late version and use every available feature for convenience and compactness).

I then re-wrote it in awk out of curiosity and it looked almost the same.

Crazy bash expansion syntax and commandline parser abuse was replaced by actual proper functions, but the whole thing when done was almost a line by line in-place replacement, so almost the same loc and structure.

Both versions share most of the same advantages over something like python. Both single binary interpreters always already installed. Both versions will run on basically any system any platform any version (going forward at least) without needing to install anything let alone anything as gobsmacking ridiculous as pip or venv.(1)

But the awk version is actually readable.

And unlike bash, awk already pretty much stopped changing very much decades ago, so not only is it forward compatible, it's pretty backwards compatible too.

Not that that is generally a thing you have to worry about. We don't make new machines that are older than some code we wrote 5 years ago. Old bash or awk code always works on the next new machine, and that's all you ever need(2).

There is gnu vs bsd vs posix vs mawk/nawk but that's not much of a problem and it's not a constantly breaking new-version problem but the same gnu vs posix differences for the last 30 years. You have to knowingly go out of your way to use mawk etc.

(1) bash you still have for example how everything is on bash 5 or at worst 4, except a brand new Mac today still ships with bash3, and so you can actually run into backwards compatibility in bash.

(2) and bash does actually have plugins & extensions and they do vary from system to system so you do have things you either need to avoid using or run into exactly the same breakage as python or ruby or whatever.

For writing a program vs gluing other programs together, really awk should be the goat.

benjaminogles|3 months ago

I feel the same about using Awk, it is just fun to use. I like that variables have defined initial values so they don't need to be declared. And the most common bits of control flow needed to process an input file are implicit. Some fun things I've written with awk

Plain text accounting program in awk https://github.com/benjaminogles/ledger.bash

Literate programming/static site generator in awk https://github.com/benjaminogles/lit

Although the latter just uses awk as a weird shell and maintains a couple child processes for converting md to html and executing code blocks with output piped into the document

packetlost|3 months ago

AWK, rc, and mk are the 3 big tools in my shell toolkit. It's great

nmz|3 months ago

Why mk instead of any of the other builders?

sudahtigabulan|3 months ago

> That’s typical usage of Awk, where you use it in place of cut because you can’t be bothered to remember the right flags for cut.

Even you remember the flags, cut(1) will not be able to handle ls -l. And any command that uses spaces for aligning the text into fixed-width columns.

Unlike awk(1), cut(1) only works with delimiters that are a single character. Meaning, a run of spaces will be treated like several empty fields. And, depending on factors you don't control, every line will have different number of fields in it, and the data you need to extract will be in a different field.

You can either switch to awk(1), because its default field separator treats runs of spaces as one, or squeeze them with tr(1) first:

  ls -l | tr -s' ' | cut -d' ' -f3

lelanthran|3 months ago

Cut has flags to extract byte or character ranges.

You don't have to use fields.

abhgh|3 months ago

Love awk. In the early days of my career, I used to write ETL pipelines and awk helped me condense a lot of stuff into a small number of LOC. I particularly prided myself in writing terse one-liners (some probably undecipherable, ha!); but did occasionally write scripts. Now I mostly reach for Python.

tetris11|3 months ago

one of the best word-wrapping implementations I've seen (handles color codes and emojis just fine!) is written in pure mawk

very fast, highly underrated language

I'm not sure how good it would be for pipelines, if a step should fail, or if a step should need to resume, etc.

meken|3 months ago

This sounds interesting. Could you give an example where you rewrote a pipeline in awk?

ketanmaheshwari|3 months ago

This pipeline may be significantly reduced by replacing cut's with awk, accommodating grep within awk and using awk's gsub in place of tr.

dietrichepp|3 months ago

Somebody wanted to set breakpoints in their C code by marking them with a comment (note “d” for “debugger”):

//d

You can get a list of them with a single Awk line.

  awk -F'//d[[:space:]]*' 'NF > 1 {print FILENAME ":" FNR " " $2}' source/*.c

You can even create a GDB script, pretty easily.

(IMO, easier still to configure your editor to support breakpoints, but I’m not the one who chose to do it this way.)

nmz|3 months ago

awk is so much better than sed to learn given its ability, the only unix tool it doesn't replace is tr and tail, but other than that, you can use it instead of grep, cut, sed, head.

stevekemp|3 months ago

I think you could replace tail with awk, if you absolute needed to. This is a naive attempt:

   cat /etc/passwd | \
   awk -v n=10 '{ lines[NR] = $0 }
            END{
                for (i = NR - n + 1; i <= NR; i++)
                    if (i > 0) print lines[i]
            }'

unknown|3 months ago

[deleted]

RGBCube|3 months ago

Stop using awk, use a real programming language+shell instead, with structured data instead of bytestream wrangling:

  > ls -l | get user

  ┌────┬──────┐
  │  0 │ cube │
  │  1 │ cube │
  │  2 │ cube │
  │  3 │ cube │
  │  4 │ cube │
  │  5 │ cube │
  │  6 │ cube │
  │  7 │ cube │
  │  8 │ cube │
  │  9 │ cube │
  │ 10 │ cube │
  │ 11 │ cube │
  │ 12 │ cube │
  │ 13 │ cube │
  │ 14 │ cube │
  │ 15 │ cube │
  └────┴──────┘

You don't need to memorize bad tools' quirks. You can just use good tools.

https://nushell.sh - try Nushell now! It's like PowerShell, if it was good.

electricEmu|3 months ago

PowerShell is open source and available on Linux today for those who enjoy an OO terminal.

MIT licensed.

https://learn.microsoft.com/en-us/powershell/scripting/insta...

simoncion|3 months ago

> try Nushell now!

So, I'm curious. What's the Nushell reimplementation of the 'crash-dump.awk' script at the end of the "Awk in 20 Minutes" article on ferd.ca ? Do note that "I simply won't deal with weirdly-structured data." isn't an option.

anthk|3 months ago

Once you get TSV and CSV related tools, nushell and psh are like toys.

ryapric|3 months ago

While your recommendation is sound: this is not only a rudely-worded take, but also missing the point of the parent comment.

esafak|3 months ago

Also, the nushell code is self-explanatory. Who knows what $3 refers to?