top | item 44638356

(no title)

rwiggins | 7 months ago

Oh, fantastic. jq has become an integral part of work for me.

I'll use this opportunity to plug the one-liner I use all the time, which summarizes the "structure" of a doc in a jq-able way: https://github.com/stedolan/jq/issues/243#issuecomment-48470... (I didn't write it, I'm just a happy user)

For example:

    $ curl -s 'https://ip-ranges.amazonaws.com/ip-ranges.json' | jq -r '[path(..)|map(if type=="number" then "[]" else tostring end)|join(".")|split(".[]")|join("[]")]|unique|map("."+.)|.[]'
    .
    .createDate
    .ipv6_prefixes
    .ipv6_prefixes[]
    .ipv6_prefixes[].ipv6_prefix
    .ipv6_prefixes[].network_border_group
    .ipv6_prefixes[].region
    .ipv6_prefixes[].service
    .prefixes
    .prefixes[]
    .prefixes[].ip_prefix
    .prefixes[].network_border_group
    .prefixes[].region
    .prefixes[].service
    .syncToken
(except I have it aliased to "jq-structure" locally of course. also, if there's a new fancy way to do this, I'm all ears; I've been using this alias for like... almost a decade now :/)

In the spirit of trying out jqfmt, let's see how it formats that one-liner...

    ~  echo '[path(..)|map(if type=="number" then "[]" else tostring end)|join(".")|split(".[]")|join("[]")]|unique|map("."+.)|.[]' | ~/go/bin/jqfmt -ob -ar -op pipe
    [
        path(..) | 
        map(if type == "number" then "[]" else tostring end) | 
        join(".") | 
        split(".[]") | 
        join("[]")
    ] | 
        unique | 
        map("." + .) | 
        .[]%
    ~  
Not bad! Shame that jqfmt doesn't output a newline at the end, though. The errant `%` is zsh's partial line marker. Also, `-ob -ar -op pipe` seems like a pretty good set of defaults to me - I would prefer that over it (seemingly?) not doing anything with no flags. (At least for this sample snippet.)

discuss

order

naniwaduni|7 months ago

For small problem sizes, you can get a nontrivial improvement by moving the unique up ahead of all the string manipulation:

    jq -r '[path(..)|map(if type=="number" then "[]" end)]|unique[]|join(".")/".[]"|"."+join("[]")'
For larger problem sizes, you might enjoy this approach to avoid generating the array of all paths as an intermediate, instead producing a deduped shadow structure as you go along:

    jq -rn --stream 'reduce (inputs|select(.[1])[0]|map(if type=="number" then "[]" end)) as $_ (.; setpath($_; 1))|path(..)|join(".")/".[]"|"."+join("[]")'
(Note that in either case, you still run yourself into a bit of trouble with fields named "[]", as well as field names with "." in them. I assume this is not a serious issue, since you're only ever looking at this interactively.)

petercooper|7 months ago

Not anywhere near as sophisticated as yours but I have something vaguely similar for simplifying JSON documents (while maintaining what the data also looks like) for feeding to LLMs to help them code against:

    jq 'walk(if type == "array" then (if length > 0 then [.[0]] else . end) else . end)'
So that 70,000+ line Amazon example of yours would boil down to:

    {
      "syncToken": "1753114994",
      "createDate": "2025-07-21-16-23-14",
      "prefixes": [
        {
          "ip_prefix": "3.4.12.4/32",
          "region": "eu-west-1",
          "service": "AMAZON",
          "network_border_group": "eu-west-1"
        }
      ],
      "ipv6_prefixes": [
        {
          "ipv6_prefix": "2600:1f69:7400::/40",
          "region": "mx-central-1",
          "service": "AMAZON",
          "network_border_group": "mx-central-1"
        }
      ]
    }
.. which is easier/cheaper to feed to an LLM for getting it to write code to process, etc. than the multi-megabyte original.

rwiggins|7 months ago

Oh wow, that's fantastic. I love that it includes real values while still summarizing the doc's structure. I'm going to steal that. I'll probably keep jq-structure around because it's so easy to copy/paste paths I'm looking for, but yours is definitely better for understanding what the JSON doc actually contains.

naniwaduni|7 months ago

Got a bit nerd-sniped here, but first of all we can reduce if A then B else . end === if A then B end since jq 1.7:

    jq 'walk(if type == "array" then (if length > 0 then [.[0]] end) end)'
Now we could contract those conditionals:

    jq 'walk(if type == "array" and length > 0 then [.[0]] end)'
but it turns out we can even more usefully express if length > 0 then [.[0]] end === [limit(1; .[])] == .[:1]:

    jq 'walk(if type == "array" then .[:1] end)'
From here, we can golf it a little further (this is kind of a generic type-matching pattern):

    jq 'walk(arrays[:1] // .)'
although this does incur a bit more overhead than checking type directly.

Speaking of overhead, though, it turns out that the implementation of walk/1 (https://github.com/jqlang/jq/blob/master/src/builtin.jq#L212) will actually run the filter on every element of an array, even though we're about to throw most of them out, which we can eliminate by writing the recursion explicitly:

    jq 'def w: if type=="array" then [limit(1; .[]|w)] elif type=="object" then .[] |= w end; w'
which gets the operation down from ~200 ms on my machine (not long enough to really get distracted, but enough to feel the wait) to a perceptually instant ~40 ms (which is mostly just the cost of reading the input). Now we can golf it down a little more:

    jq 'def w: if type=="array" then [limit(1; .[]|w)] else objects[] |= w end; w'
    jq 'def w: (arrays[:1]|map(w)) // (objects[] |= w); w'
(the precedence here actually allows us to eliminate the parens here...)

    jq 'def w: arrays |= .[:1]|iterables[] |= w; w'
And, inaccessibility of the syntax aside, I think this does an incredible job of expressing the essence of what we're trying to do: we trim any array down to its first element, and then recursively apply the same transformation throughout the structure. jq is a very expressive language, it just looks like line noise...

jzelinskie|7 months ago

This is an incredibly useful one-liner. Thank you for sharing!

I'm a big fan of jq, having written my own jq wrapper that supports multiple formats (github.com/jzelinskie/faq), but these days I find myself more quickly reaching for Python when I get any amount of complexity. Being able to use uv scripts in Python has considerably lowered the bar for me to use it for scripting.

Where are you drawing the line?

rwiggins|7 months ago

Hmm. I stick to jq for basically any JSON -> JSON transformation or summarization (field extraction, renaming, etc.). Perhaps I should switch to scripts more. uv is... such a game changer for Python, I don't think I've internalized it yet!

But as an example of about where I'd stop using jq/shell scripting and switch to an actual program... we have a service that has task queues. The number of queues for an endpoint is variable, but enumerable via `GET /queues` (I'm simplifying here of course), which returns e.g. `[0, 1, 2]`. There was a bug where certain tasks would get stuck in a non-terminal state, blocking one of those queues. So, I wanted a simple little snippet to find, for each queue, (1) which task is currently executing and (2) how many tasks are enqueued. It ended up vaguely looking like:

    for q in $(curl -s "$endpoint/queues" | jq -r '.[]'); do
        curl -s "$endpoint/queues/$q" \
        | jq --arg q "$q" '
            {
                "queue": $q,
                "executing": .currently_executing_tasks,
                "num_enqueued": (.enqueued_tasks | length)
            }'
    done | jq -s

which ends up producing output like (assuming queue 0 was blocked)

    [
        {
            "queue": 0,
            "executing": [],
            "num_enqueued": 100
        },
        ...
    ]
I think this is roughly where I'd start to consider "hmm, maybe a proper script would do this better". I bet the equivalent Python is much easier to read and probably not much longer.

Although, I think this example demonstrates how I typically use jq, which is like a little multitool. I don't usually write really complicated jq.

Elucalidavah|7 months ago

> wrapper that supports multiple formats

Is there a way to preserve key ordering, particularly for yaml output? And to customize the color output? Or, how feasible is it to add that?

dotancohen|7 months ago

I could Google it, but tell a bit more about uv scripts. Isn't uv a package manager like pip?

Bluestein|7 months ago

May I also add this ain't a mere one liner. It's a masterclass!

jdc0589|7 months ago

this is a super useful oneliner, immediately saved to my bash profile as `jqstructure`