top | item 38569240

JC converts the output of popular command-line tools to JSON

324 points| tosh | 2 years ago |github.com | reply

159 comments

order
[+] Mister_Snuggles|2 years ago|reply
In FreeBSD, this problem was solved with libxo[0]:

    $ ps --libxo=json | jq
    {
      "process-information": {
        "process": [
          {
            "pid": "41389",
            "terminal-name": "0 ",
            "state": "Is",
            "cpu-time": "0:00.01",
            "command": "-bash (bash)"
          },
    [...]

It's not perfect though. ls had support, but it was removed for reasons[1]. It's not supported by all of the utilities, etc.

This seems to be a great stop-gap with parsers for a LOT of different commands, but it relies on parsing text output that's not necessarily designed to be parsed. It would be nice if utilities coalesced around a common flag to emit structured output.

In PowerShell, structured output is the default and it seems to work very well. This is probably too far for Unix/Linux, but a standard "--json" flag would go a long way to getting the same benefits.

[0] https://wiki.freebsd.org/LibXo

[1] https://reviews.freebsd.org/D13959

[+] ekidd|2 years ago|reply
> In PowerShell, structured output is the default and it seems to work very well.

PowerShell goes a step beyond JSON, by supporting actual mutable objects. So instead of just passing through structured data, you effectively pass around opaque objects that allow you to go back to earlier pipeline stages, and invoke methods, if I understand correctly: https://learn.microsoft.com/en-us/powershell/module/microsof....

I'm rather fond of wrappers like jc and libxo, and experimental shells like https://www.nushell.sh/. These still focus on passing data, not objects with executable methods. On some level, I find this comfortable: Structured data still feels pretty Unix-like, if that makes sense? If I want actual objects, then it's probably time to fire up Python or Ruby.

Knowing when to switch from a shell script to a full-fledged programming language is important, even if your shell is basically awesome and has good programming features.

[+] evnp|2 years ago|reply
> In PowerShell, structured output is the default and it seems to work very well. This is probably too far for Unix/Linux, but a standard "--json" flag would go a long way to getting the same benefits.

OP has a blog post[0] which describes exactly this. `jc` is described as a tool to fill this role "in the meantime" -- my reading is that it's intended to serve as a stepping stone towards widespread `-j`/`--json` support across unix tools.

[0] https://blog.kellybrazil.com/2019/11/26/bringing-the-unix-ph...

[+] supriyo-biswas|2 years ago|reply
Similarly, in SerentityOS, stuff under /proc return JSON data rather than unstructured text files.

A better, structural way in which this could be fixed is to allow data structures to be exported in ELFs and have those data structures serialized into terminal output, which can then be outputted in the preferred format of the user, such as JSON, YAML, or processed accordingly.

[+] nijave|2 years ago|reply
Libxo is neat, in theory, but it seems like applications are left to implement their own logic for a given output format rather than being able to pass a structure to libxo and let it do the formatting.

I can't remember the exact utility--I think it was iostat--would use string interpolation to format output lines in JSON and combined with certains flags produced completely mangled output. Not sure if things have improved but I would have expected something like JSON lines when interval is provided.

Powershell and kubectl are miles ahead of libxo in useability imo

[+] nerdponx|2 years ago|reply
What I find weird about Powershell is that there's no "fixed-width column" parser, which is a widely used format for Unix-style CLI tools.

I don't know if NuShell has it, I haven't tried.

In any case, it's much better for tools to output more-parseable data in the first place. Whitespace-delimited columns are fine of course, but not so much when the data can contain whitespace, as in the output from `ps`.

I don't see much reason why JSONLines (https://jsonlines.org/) / NDJSON (https://ndjson.org/) can't be a standard output format from most tools, in addition to tables.

As for the reason of removal:

  any language that can interact with json output can use readdir(3) and stat(2).
Ugh. Any language of course can do it. But that's basically telling users that they need to reimplement ls(1) themselves if they want to use any of its output and features in scripts.

I understand if the maintenance burden is too high to put it in ls(1) itself, but it's a shame that no tool currently does this. The closest we have is a feature request in Eza: https://github.com/eza-community/eza/issues/472

[+] rezonant|2 years ago|reply
> This seems to be a great stop-gap with parsers for a LOT of different commands, but it relies on parsing text output that's not necessarily designed to be parsed

True, and yet it's extremely common to parse output in bash scripts and other automations, so in a sense it's just centralizing that effort. That being said at least when you do it yourself you can fix problems directly.

[+] imtringued|2 years ago|reply
Now they only need to do the same thing for input and let the operating system or the shell handle the argument parsing so that it is consistent accross the entire operating system.
[+] msla|2 years ago|reply
> It's not perfect though. ls had support, but it was removed for reasons

In specific:

https://svnweb.freebsd.org/base?view=revision&revision=32810...

> libxo imposes a large burden on system utilities. In the case of ls, that burden is difficult to justify -- any language that can interact with json output can use readdir(3) and stat(2).

Which rather misses the point of being able to use JSON in shell scripts.

[+] kbknapp|2 years ago|reply
Really cool idea but this gives me anxiety just thinking about how it has to be maintained. Taking into account versions, command flags changing output, etc. all seems like a nightmare to maintain to the point where I'm assuming actual usage of this will work great for a few cases but quickly lose it's novelty beyond basic cases. Not to mention using `--<CMD>` for the tool seems like a poor choice as your help/manpage will end up being thousands of lines long because each new parser will require a new flag.
[+] cproctor|2 years ago|reply
Would it be fair to think about this as a shim whose scope of responsibility will (hopefully) shrink over time, as command line utilities increasingly support JSON output? Once a utility commits to handling JSON export on its own, this tool can delegate to that functionality going forward.
[+] zimmund|2 years ago|reply
> Not to mention using `--<CMD>`

If you read further down in the documentation, you can just prefix your command with `jc` (e.g. `jc ls`). The `--cmd` param is actually a good idea, since it allows you to mangle the data before converting it (e.g. you want to grep a list before converting it).

Regarding maintenance, most of the basic unix commands' output shouldn't change too much (they'd be breaking not only this tool but a lot of scripts). I wouldn't expect it to break as often as you imagine, at least not because of other binaries being updated.

[+] eichin|2 years ago|reply
I'm sort of torn - yeah, one well-maintained "basket" beats having a bunch of ad-hoc output parsers all over the place, but I want direct json output because I'm doing something complicated and don't want parsing to add to the problem. (I suppose the right way to get comfortable with using this is to just make sure to submit PRs with additional test cases for everything I want to use it with, since I'd have to write those tests anyway...)
[+] majkinetor|2 years ago|reply
This requires collaboration. People submitting parsing info for the tool they need, and people that use it to easily keep it up to date. That is the only way.
[+] verdverm|2 years ago|reply
This is one of the better use cases for LLMs, which have shown good capability at turning unstructured text into structured objects
[+] abound|2 years ago|reply
Nushell [1] ends up at mostly the same place (structured data from shell commands) with a different approach, mostly just being a shell itself.

[1] http://www.nushell.sh/

[+] danyx23|2 years ago|reply
Nushell actually pairs really well with JC, given that nushell has a "from json" operation. I recorded a video some time ago that shows a few nice features of Nushell and I bring up combining it with jc at around minute 19: https://www.youtube.com/watch?v=KF5dtxVsn1E
[+] saghm|2 years ago|reply
I had glanced at nunshell every now and then since it was initially announced, but it wasn't until a month or two before that I finally really "got" the point of it. I was trying to write a script to look through all of the files in a directory matching a certain pattern and pruning them to get rid of ones with modified timestamps within 10 minutes of each other. I remembered that nushell was supposed to be good for things like this, and after playing around with it for a minute, it finally "clicked" and now I'm hooked. Even when dealing with unstructured data, there's a lot of power in being able to convert it even into something as a list of records (sort of like structs) and process it from there.
[+] calvinmorrison|2 years ago|reply
In a certain sense, files everywhere is great, that's the promise of unix, or plan9 to a further extent.

However, unstructured files, or files that all have their own formats, is also equally hampering. Trying to even parse an nginx log file can be annoying with just awk or some such.

One of the big disadvantages is that large system rewrites and design changes cannot be executed in the linux userland.

All to say, I'd love a smarter shell, I love files, I have my awk book sitting next to me, but I think it's high time to get some serious improvements on parsing data.

In the same way programs are smart enough to know to render colored output or not, I'd love it if it could dump structured output (or not)

[+] mistercow|2 years ago|reply
Part of the problem is that the output of commands is both a UI and an API, and because any text UI can be used as an API, the human readable text gets priority. Shell scripting is therefore kind of like building third party browser extensions. You look and you guess, and then you hack some parser up based on your guess, and hope for the best.

I actually wish there was just a third standard output for machine readable content, which your terminal doesn’t print by default. When you pipe, this output is what gets piped (unless you redirect), it’s expected to be jsonl, and the man page is expected to specify a contract. Then stdout can be for humans, and while you can parse it, you know what you’re doing is fragile.

Of course, that’s totally backwards incompatible, and as long as we’re unrealistically reinventing CLIs from the foundations to modernize them, I have a long list of changes I’d make.

[+] dale_glass|2 years ago|reply
Yup. It really grinds my gears that people came up with fairly decent ideas half a century ago, and a large amount of people decided to take that as gospel rather than as something to improve on.

And it's like pulling teeth to get any improvement, because the moment somebody like Lennart tries to get rid of decades of old cruft, drama erupts.

And even JSON is still not quite there. JSON is an okay-ish idea, but to do this properly what we need is a format that can expose things like datatypes. More like PowerShell. So that we can do amazing feats like treating a number like a number, and calculating differences between dates by doing $a - $b.

[+] hoherd|2 years ago|reply
> In the same way programs are smart enough to know to render colored output or not, I'd love it if it could dump structured output (or not)

The even lower hanging fruit is to implement json output as a command line argument in all cli tools. I would love to see this done for the gnu core utils.

[+] da_chicken|2 years ago|reply
There is always Powershell. The trouble there is that it's so rooted in .Net and objects that it's very difficult to integrate with existing native commands on any platform.
[+] mikepurvis|2 years ago|reply
When it comes to parsing server logs, it's too bad the functionality can't be extracted out of something like logstash, since that's already basically doing the same thing.

Though I guess the real endgame here is for upstream tools to eventually recognize the value and learn how to directly supply structured output.

[+] imtringued|2 years ago|reply
The problem is that nobody has built an actual ffi solution except maybe the GObject guys. C isn't an ffi, because you need a C compiler to make it work. By that I mean it is not an interface, but rather just C code whose calling part has been embedded into your application.
[+] numbsafari|2 years ago|reply
> Trying to even parse an nginx log file can be annoying with just awk or some such.

You probably already know this, but for those who do not, you can configure nginx to generate JSON log output.

Quite handy if you are aggregating structured logs across your stack.

[+] PreInternet01|2 years ago|reply
Oh, this is cool. I'm a huge proponent of CLI tools supporting sensible JSON output, and things like https://github.com/WireGuard/wireguard-tools/blob/master/con... and PowerShell's |ConvertTo-Json are a huge part of my management/monitoring automation efforts.

But, unfortunately, sensible is doing some heavy lifting here and reality is... well, reality. While the output of things like the LSI/Broadcom StorCLI 'suffix the command with J' approach and some of PowerShell's COM-hiding wrappers (which are depressingly common) is technically JSON, the end result is so mindbogglingly complex-slash-useless, that you're quickly forced to revert to 'OK, just run some regexes on the plain-text output' kludges anyway.

Having said that, I'll definitely check this out. If the first example given, parsing dig output, is indeed representative of what this can reliably do, it should be interesting...

[+] sesm|2 years ago|reply
IMO ‘jc dig example.com’ should be the primary syntax, because ‘dig example.com | jc —dig’ has to retroactively guess the flags and parameters of previous command to parse the output.
[+] nickster|2 years ago|reply
All output being an object is one of my favorite things about powershell. I miss it when I have to write a bash script.
[+] pushedx|2 years ago|reply
I salute whoever chooses to maintain this
[+] timetraveller26|2 years ago|reply
Does anybody know of a listof modern unix command-line tools accepting a --json option?

It may even be useful to add that information to this repo.

[+] chungy|2 years ago|reply
Basically everything on FreeBSD supports it via libxo.
[+] bravetraveler|2 years ago|reply
I don't have a list, but the modern replacement for "ifconfig" does JSON: "ip"

As does "lldpctl"

Ansible provides details about systems in JSON called 'facts'. The intention is to use these to inform automation

[+] pkkm|2 years ago|reply
lsblk accepts a --json flag and can give you a lot of information (try lsblk --json --output-all). Very useful if your script needs to check what disks and partitions there are in the system.
[+] geraldcombs|2 years ago|reply
TShark (the CLI companion to Wireshark) does with the `-T json` flag.
[+] user3939382|2 years ago|reply
Probably not what you had in mind but, AWS CLI.
[+] pirates|2 years ago|reply
kubectl with “-o json”
[+] Cyph0n|2 years ago|reply
Interesting project! But I expected them to be using textfsm (or something similar) as a first step parser. textfsm is heavily used to parse CLI outputs in networking devices.

https://github.com/google/textfsm

[+] Animats|2 years ago|reply
I always felt that was a design flaw of UNIX. Programs accept command line parameters and environment variables as input, but all they output for their calling program is an integer exit code. It's not like exit(II) has an argv and an argc. GUI programs that call command line programs thus tend to be nearly blind to what the called program did. You can't treat command line programs as subroutines.

I know why it worked that way in Research Unix for the PDP-11. It's a property of the hokey trick used to make fork(II) work on tiny machines. It didn't have to stay that way for four decades.

[+] nailer|2 years ago|reply
Nice.

Too many "lets fix the command line" (nushell, pwsh) have noble goals, but also start with "first let's boil the ocean".

We need to easily ingest old shitty text output for a little while to move to the new world of structured IO.

[+] freedomben|2 years ago|reply
Really glad to see this is already packaged for most linux distributions. So many utilities nowadays seem be written in Python, and python apps are such a PITA to install without package manager packages. There's so many different ways to do it and everything seems to be a little different. Some will require root and try to install on top of package manager owned locations, which is a nightmare.

Fedora Toolbox has been wonderful for this exact use case (installing Python tools), but for utilities like this that will be part of a bash pipe chain for me, toolbox won't cut it.

[+] Spivak|2 years ago|reply
Installing self-contained programs written in Python not packaged for your distro:

    PIPX_HOME=/usr/local/pipx PIPX_BIN_DIR=/usr/local/bin pipx install app==1.2.3
It sets up an isolated install for each app with only its deps and makes it transparent.

The distro installation tree of Python is for the exclusive use of your distro because core apps cloud-init, dnf, firewalld are built against those versions.

[+] mejutoco|2 years ago|reply
I wonder if a tool could parse any terminal output into json in a really dumb and deterministic way:

    {
      "lines": [
         "line1 bla bla",
         "line1 bla bla",
       ],
      "words": [
         "word1",
         "word2",
       ],
    }
With enough representations (maybe multiple of the same thing) to make it easy to process, without knowing sed or similar. It seems hacky but it would not require any maintenance for each command, and would only change if the actual output changes.
[+] Pxtl|2 years ago|reply
Honestly this is half the reason I use Powershell for everything. Bash-like experience but everything returns objects.

It's a messy, hairy, awful language. Consistently inconsistent, dynamically-typed in the worst ways, "two googles per line" complexity, etc.

But for the convenience of being able to combine shell-like access to various tools and platforms combined with the "everything is a stream of objects" model, it can't be beat in my experience.

And you can still do all the bash-like things for tools that don't have good Powershell wrappers that will convert their text-streams into objects. Which, sadly, is just about everything.

[+] da39a3ee|2 years ago|reply
Awesome, does it work for man pages? They're a huge elephant in the room -- people get really upset if you point out that man pages are an unsearchable abomination, locking away vast amounts of important information about unix systems in an unparseable mess. But, it's true.
[+] rplnt|2 years ago|reply
Wish it would have automatic parser selection by default. Even if just for a (possible) selected subset. Typing `foo | jc | jq ...` would be more convenient than `foo | jc --foo | jq ...`.
[+] zubairq|2 years ago|reply
Simple idea, really great to see this!