> Another option is to use Python, which is ubiquitous enough that it can be expected to be installed on virtually every machine
Not on macOS. You can do it easily by just invoking /usr/bin/python3, but you’ll get a (short and non-threatening) dialog to install the Xcode Command-Line Developer Tools. For macOS environments, JavaScript for Automation (i.e. JXA, i.e /usr/bin/osascript -l JavaScript) is a better choice.
However, since Sequoia, jq is now installed by default on macOS.
I feel like the takeaway here is that jq should probably be considered an indispensable part of the modern shell environment. Because without it, you can't sensibly deal with JSON in a shell script, and JSON is everywhere now. It's cool that you can write a parser in awk but the point of this kind of scripting language is to not have to do things like that.
One day I wanted to use a TAP parser for the Test Anything Protocol.
But I didn't want to be bogged down by dependencies.. so I didn't want to go nowhere near Python (and pyenv.. and anaconda.. and then probably having to dockerize that for some reason too..) nor nodeJS nor any of that.
Found a bash shell script to parse TAP written by ESR of all people. That sounds fine, I thought. Most everywhere has bash, and there are no other dependencies.
But it was slow. I mean.. painfully, ridiculously slow. Parsing like 400 lines of TAP took almost a minute.
That's when I did some digging and learned about what awk really is. A scripting language that's baked into the POSIX suite. I had heard vaguely about it beforehand, but never realized it had more power than it's sed/ed/cut/etc brethren in the suite.
So I coded up a TAP parser in awk and it went swimmingly, matched the ESR parser's feature set, and ran millions of lines in less than a second. Score! :D
For the record, "python-without-extra-dependencies" is a thing and a very nice one too. I always prefer it over awk.
Highly recommend to everyone - plenty of "batteries" included, like json parser, basic http client and even XML parser, and no venv/conda required. Very good forward compatibility. Fast (compared to bash).
Contemplating this, it’s too bad the Unix scripting ecosystem never evolved a tripartite symbiosis of ‘file‘, ‘lex‘, and ‘yacc‘, or similar tools.
That is, one tool to magically identify a file type, one to tokenize it based on that identification, one to correspondingly parse it. All in a streaming/pipe-friendly mode.
Would fit right in, other than the Unix prejudice (nonsensical from Day 0) for LF-separated text records as the “one true format”.
I guess ... I don't think it really gets you much unless it's 1-pass streaming otherwise we're dealing with entire input buffers and then we're just back to files.
You could argue using the vertical separator | is more syntactically graceful but then it's just a shell argument. There's quite a few radically different shells out there these days like xonsh, murex, and nushell so if simply arranging logic on the screen in a different syntax is what you're looking for then that's probably the way.
Awk is great and this is a great post. But dang, awk really shoots itself so much with its lack of features that it so desperately needs!
Like: printing all but one column somewhere in the middle. It turns into long, long commands that really pull away from the spirit of fast fabrication unix experimentation.
>awk really shoots itself so much with its lack of features that it so desperately needs!
That's why I use Perl instead (besides some short one liners in awk, which in some cases are even shorter than the Perl version) and do my JSON parsing in Perl.
JSON is not a friendly format to the Unix shell — it’s hierarchical, and cannot be reasonably split on any character
Yes, shell is definitely too weak to parse JSON!
(One reason I started https://oils.pub is because I saw that bash completion scripts try to parse bash in bash, which is an even worse idea than trying to parse JSON in bash)
I'd argue that Awk is ALSO too weak to parse JSON
The following code assumes that it will be fed valid JSON. It has some basic validation as a function of the parsing and will most likely throw an error if it encounters something strange, but there are no guarantees beyond that.
Yeah I don't like that! If you don't reject invalid input, you're not really parsing
---
OSH and YSH both have JSON built-in, and they have the hierarchical/recursive data structures you need for the common Python/JS-like API:
osh-0.33$ var d = { date: $(date --iso-8601) }
osh-0.33$ json write (d) | tee tmp.txt
{
"date": "2025-06-28"
}
Parse, then pretty print the data structure you got:
Also, OSH is now FASTER than bash, in both computation and I/O.
This is despite garbage collection, and despite being written in typed Python! I hope to publish a post about these recent improvements
I don't really buy that shell / awk is "too weak" to deal with JSON, the ecosystem of tools is just fairly immature as most of the shells common tools predate JSON by at least a decade. `jq` being a pretty reasonable addition to the standard set of tools included in environments by default.
IMO the real problem is that JSON doesn't work very well at as a because it's core abstraction is objects. It's a pain to deal with in pretty much every statically typed non-object oriented language unless you parse it into native, predefined data structures (think annotated Go structs, Rust, etc.).
The same author already had made the more thorough jawk. They explicitly said they wanted a cut down version. It's not illegal to want a cut down version of something.
I use flex. Faster than awk or python. It produces relatively seems to be available "everywhere" because it is a build requirement for so many software programs. For exampole, NetBSD toolchain includes it. It is a build requirement for Linux kernel.^1 I have even used it with Interix SFU on Windows before WSL existed.
I do not use jq. Too complicated for me. Overkill. I created statically-linked program less than half the size of official statically-linked jq that is adequate for own needs. flex is a build requirement for jq.
Using C program generated with flex is faster than AWK or Python. Flex is required to compile the awk interpreter, as well as the jq interpreter. I use a custom scanner generated by flex to process JSON.
latexr|8 months ago
Not on macOS. You can do it easily by just invoking /usr/bin/python3, but you’ll get a (short and non-threatening) dialog to install the Xcode Command-Line Developer Tools. For macOS environments, JavaScript for Automation (i.e. JXA, i.e /usr/bin/osascript -l JavaScript) is a better choice.
However, since Sequoia, jq is now installed by default on macOS.
ameliaquining|8 months ago
HappMacDonald|8 months ago
But I didn't want to be bogged down by dependencies.. so I didn't want to go nowhere near Python (and pyenv.. and anaconda.. and then probably having to dockerize that for some reason too..) nor nodeJS nor any of that.
Found a bash shell script to parse TAP written by ESR of all people. That sounds fine, I thought. Most everywhere has bash, and there are no other dependencies.
But it was slow. I mean.. painfully, ridiculously slow. Parsing like 400 lines of TAP took almost a minute.
That's when I did some digging and learned about what awk really is. A scripting language that's baked into the POSIX suite. I had heard vaguely about it beforehand, but never realized it had more power than it's sed/ed/cut/etc brethren in the suite.
So I coded up a TAP parser in awk and it went swimmingly, matched the ESR parser's feature set, and ran millions of lines in less than a second. Score! :D
theamk|8 months ago
Highly recommend to everyone - plenty of "batteries" included, like json parser, basic http client and even XML parser, and no venv/conda required. Very good forward compatibility. Fast (compared to bash).
twoodfin|8 months ago
That is, one tool to magically identify a file type, one to tokenize it based on that identification, one to correspondingly parse it. All in a streaming/pipe-friendly mode.
Would fit right in, other than the Unix prejudice (nonsensical from Day 0) for LF-separated text records as the “one true format”.
kristopolous|8 months ago
You could argue using the vertical separator | is more syntactically graceful but then it's just a shell argument. There's quite a few radically different shells out there these days like xonsh, murex, and nushell so if simply arranging logic on the screen in a different syntax is what you're looking for then that's probably the way.
chaps|8 months ago
Like: printing all but one column somewhere in the middle. It turns into long, long commands that really pull away from the spirit of fast fabrication unix experimentation.
jq and sql both have the same problem :)
thrwwy9234|8 months ago
SoftTalker|8 months ago
Whence perl.
toddm|8 months ago
jcynix|8 months ago
That's why I use Perl instead (besides some short one liners in awk, which in some cases are even shorter than the Perl version) and do my JSON parsing in Perl.
This
diff -rs a/ b/ | ask '/identical/ {print $4}' | xargs rm
is one of my often used awk one liners. Unless some filenames contain e.g. whitespace, then it's Perl again
mauvehaus|8 months ago
chubot|8 months ago
Yes, shell is definitely too weak to parse JSON!
(One reason I started https://oils.pub is because I saw that bash completion scripts try to parse bash in bash, which is an even worse idea than trying to parse JSON in bash)
I'd argue that Awk is ALSO too weak to parse JSON
The following code assumes that it will be fed valid JSON. It has some basic validation as a function of the parsing and will most likely throw an error if it encounters something strange, but there are no guarantees beyond that.
Yeah I don't like that! If you don't reject invalid input, you're not really parsing
---
OSH and YSH both have JSON built-in, and they have the hierarchical/recursive data structures you need for the common Python/JS-like API:
Parse, then pretty print the data structure you got: Create a JSON syntax error on purpose: (now I see the error message could be better)Another example from wezm yesterday: https://mastodon.decentralised.social/@wezm/1147586026608361...
YSH has JSON natively, but for anyone interested, it would be fun to test out the language by writing a JSON parser in YSH
It's fundamentally more powerful than shell and awk because it has garbage-collected data structures - https://www.oilshell.org/blog/2024/09/gc.html
Also, OSH is now FASTER than bash, in both computation and I/O. This is despite garbage collection, and despite being written in typed Python! I hope to publish a post about these recent improvements
alganet|8 months ago
Parsing is a trivial, rejecting invalid input is trivial, the problem is representing the parsed content in a meaningful way.
> bash completion scripts try to parse bash in bash
You're talking about ble.sh, right? I investigated it as well.
I think they made some choices that eventually led to the parser being too complex, largely due to the problem of representing what was parsed.
> Also, OSH is now FASTER than bash, in both computation and I/O.
According to my tests, this is true. Congratulations!
packetlost|8 months ago
IMO the real problem is that JSON doesn't work very well at as a because it's core abstraction is objects. It's a pain to deal with in pretty much every statically typed non-object oriented language unless you parse it into native, predefined data structures (think annotated Go structs, Rust, etc.).
Brian_K_White|8 months ago
izabera|8 months ago
wutwutwat|8 months ago
teddyh|8 months ago
Brian_K_White|8 months ago
unknown|8 months ago
[deleted]
cAtte_|8 months ago
1vuio0pswjnm7|8 months ago
I do not use jq. Too complicated for me. Overkill. I created statically-linked program less than half the size of official statically-linked jq that is adequate for own needs. flex is a build requirement for jq.
1. https://www.kernel.org/doc/Documentation/admin-guide/quickly...
1vuio0pswjnm7|8 months ago