top | item 17032257

Glom – Restructured Data for Python

313 points| mhashemi | 7 years ago |sedimental.org | reply

97 comments

order
[+] nerdponx|7 years ago|reply
It's a nice idea, but i never like writing what amounts to a DSL in strings in my code (yes, that applies to in-code SQL as well, although that's often unavoidable).

I prefer the `get_in()` method from Toolz: http://toolz.readthedocs.io/en/latest/api.html#toolz.dicttoo...

[+] jeremiahwv|7 years ago|reply
I agree, I don't like the magic string approach (even if it is mostly just dot-notation attribute lookup). However, there is some good stuff here, and nested data lookup when value existence is unknown is a pain point for me.

In addition to the string based lookup, it looks like there is an attempt at a pythonic approach:

  from glom import T
  spec = T['system']['planets'][-1].values()
  glom(target, spec)
  # ['jupiter', 69]
For me though, while I can understand what is going on, it doesn't feel pythonic.

Here's what I would love to see:

  from glom import nested
  nested(target)['system']['planets'][-1].values()
And I would love (perhaps debatably) for that to be effectively equivalent to:

  nested(target).system.planets[-1].values()
Possible?

--- edit: Ignore the above idea. I thought about this a bit more and the issue that your T object solves is that in my version:

  nested(target)['system']
the result is ambiguous. Is this the end of the path query and should return original non-defaulting dict, or the middle of the path query and should return a defaulting dict? Unknown.

The T object is a good solution for this.

[+] radmin|7 years ago|reply
My thoughts also turned to toolz.

Here's a example comparison:

glom

  glom(target, ('system.planets', ['name']))
  # ['earth', 'jupiter']
toolz

  list(pluck('name', get_in(('system', 'planets'), target)))
  # ['earth', 'jupiter']
[+] abuckenheimer|7 years ago|reply
I feel the same way, although my instinct is generally to build a custom generator. Only costs a couple lines but is plain old python and quite explicit

    target = {'system': {'planets': [{'name': 'earth', 'moons': 1},
                                     {'name': 'jupiter', 'moons': 69}]}}

    glom(target, {'moon_count': ('system.planets', ['moons'], sum)})
    # vs
    def iter_moons(t):
        for planet in target['system']['planets']:
            yield planet['moons']

    sum(iter_moons(target))
would have to combine with `defaultdict`s if your nested data is only sometimes there though
[+] doublereedkurt1|7 years ago|reply
the only string parsing is 'a.b.c', which is mostly equivalent to T.a.b.c so you can completely ignore that capability

the very slight difference is that using T you must be explicit about attribute access vs key access whereas 'a.b.c' will try both

[+] typon|7 years ago|reply
Why is writing DSLs strictly worse than writing complicated transformations built on top of the limited constructs provided by the language itself? (You said never)
[+] hessammehr|7 years ago|reply
This might have been unintentional, but I suspect "Spectre of Structure" and "Python's missing piece" refer to Nathan Marz's specter library for clojure [1], similarly touted as clojure's missing piece. I tend to agree in the case of specter, given the mind-boggling types of transformations that are easily (and simply) expressed in it (and often run faster than idiomatic clojure as well). Highly recommended if you ever need to work with deeply nested data structures.

[1] https://github.com/nathanmarz/specter

[+] faitswulff|7 years ago|reply
This is really cool. Did you ever consider an API to do the reverse - to insert a value at a particular point in the data?

My interest stems from this issue[0] on the Ruby issue tracker to make a symmetrical method to Hash#dig (which does something similar to, but more limited than glom) called Hash#bury. The problem in the issue was that inserting a value at a given index in an array proved difficult and unnatural in Ruby, so I was wondering if there were other solutions out there.

Another question occurs to me - does glom only support string keys?

[0]: https://bugs.ruby-lang.org/issues/11747

[+] mhashemi|7 years ago|reply
glom not only supports more than string keys, it also supports assigning to non-dictionary objects. That's a part of the API we're working on right now, actually.

As for the data insertion, mutation may be in the future, but for now glom only transforms and returns new objects. Definitely something to think about though, bookmarked! :)

[+] doublereedkurt|7 years ago|reply
'string-key', T['string-key'], T[not-string-key] :-)
[+] derefr|7 years ago|reply
My favourite approach to this so far, that I would like other libraries to copy, is Elixir’s Access protocol, which gives you e.g.:

    foo = %{key: [[1, 2], [3, 4], [5, 6]]}

    path = [:key, Access.all, Access.at(0)]

    get_in foo, path
    # => [1, 3, 5]

    update_in foo, path, &(&1 * 10)
    # => %{key: [[10, 2], [30, 4], [50, 6]]}

    foo
    |> put_in([:key, Access.all], “foo”)
    |> put_in([:new_key], “bar”)
    # => %{key: [“foo”, “foo”, “foo”], new_key: “bar”}
That third form is essentially the equivalent of building up a complex object through a series of mutations—but entirely functional.
[+] tathougies|7 years ago|reply
This seems like lenses for python... neat! I often use python to mess around with things, and almost always miss Haskell's lenses when doing so. This seems like an interesting solution.
[+] amock|7 years ago|reply
I haven't played with this yet, but it looks really handy. I deal with much more JSON on the command line than I'd like, so I think having both a single library and command line tool to reshape that data will make that much easier. I've used jq a few times, but when I want to move a little beyond what it does I usually end up writing a Python script. Hopefully this will make that transition smoother.
[+] mhashemi|7 years ago|reply
Haha, I'm all for console usage, but let me tell you, there's nothing quite like that feeling of moving a working spec into a dedicated application with exception handling, logging, etc. :)
[+] spedru|7 years ago|reply
I'm not really versed in the idioms/social mores of Python, so please take the following with a grain of salt:

This seems like it usefully solves a problem, but the invocation pattern is suspect to me -- Instead of "glom" taking the target for picking-apart plus a magic little bit of DSL, what if "glom" took a single parameter, the aforementioned DSL, and returned a function that would perform the corresponding search when called on a target? Even if Python or this package optimises away repeatedly searching (by the same spec|in the same manner), the convention the package prescribes is odd to me, right after the first few paragraphs of intro.

[+] mhashemi|7 years ago|reply
The big, classical school of Python definitely prefers top-level functions. Still, I definitely understand that aesthetic, and am on board with not using functools.partial to achieve it. So: https://github.com/mahmoud/glom/issues/14 :)
[+] seanc|7 years ago|reply
Python regex library does this, optionally
[+] agf|7 years ago|reply
It seems to me like the advantage to focus on here is the improved error / `None` handling, which will speed debugging and make handling expected edge cases easier. I've seen a lot of inexperienced developers tripped up entirely by this kind of data access, and seen plenty of experienced developers waste time debugging it because of the exact error cases the announcement references.

The `T` object, which the article describes as its most powerful, can be a useful pattern in some situations, but it's worth pointing out it isn't new or unique to this project.

The author says in another thread here that he first started working on the "stuff leading up to glom" in 2013. One older example, which is virtually identical though less complete, is this Stack Overflow answer I posted in 2012: https://stackoverflow.com/a/9920723/500584

I'd seen the general pattern even before that post, if not the Pythonic syntax. I don't think that it's much of an improvement over defining a `lambda`, so again I would say the thing to focus on is the improved debugability and the simpler, dot-notation-as-generic-attribute-or-item-accessor syntax. I think `T` is largely a distraction, or should be reserved for advanced users.

[+] heavenlyblue|7 years ago|reply
I would like to see the author debugging an application with 10 levels of object wrapping that had one of the middle object’s name misspelled.

Libraries like these shine only if they have brilliant tracing and debugging capabilities; otherwise are too easy to reduce to literally a single function.

[+] tincholio|7 years ago|reply
It looks quite similar in spirit to Clojure's Specter library (https://github.com/nathanmarz/specter), and even seems to have a nod to it (The Spectre of Structure).
[+] lapnitnelav|7 years ago|reply
Looks really neat.

Striking a balance between ease of use / simplicity and powerful features is a tough exercise but you did well.

I can foresee the CLI being quite useful to do away with the run-of-the-mill sed / awk / grep [...] mess. Specifically for the less CLI inclined people out there.

[+] wbolster|7 years ago|reply
in a similar spirit, i wrote "sanest", sane nested objects, tailored specifically for json fornats: https://sanest.readthedocs.io/

it does not have the exact same feature set though. my focus was mostly on both reading and modifying nested structures in a type safe way.

[+] icebraining|7 years ago|reply
Can it be used bidirectionally, without having to repeat the work?

I have a need to transform between pairs of structures, in both directions, and ever since I found JsonGrammar (https://github.com/MedeaMelana/JsonGrammar2) I've been pining for a Python version.

[+] mhashemi|7 years ago|reply
It depends on the complexity of the spec, but we've already done some programmatic building of glomspecs, so for many cases I think the answer is yes! Once we feel out the patterns I think glom will gain some utilities for this purpose.
[+] neuronexmachina|7 years ago|reply
It seems like a subset of glom specs would be uniquely invertable. For example, the spec `{'c': 'a.b'}` could trivially invert to `{'a.b': 'c'}`. I'm not sure how you'd invert more complex specs which make function calls, e.g. sum or len.
[+] sixdimensional|7 years ago|reply
I had a quick look, but I didn't see filtering expressions, only shaping expressions. It seems like glom is more of a result shaper/mapper. Can you filter with glom (maybe with lambdas or something)? I could see the two going together quite well if you were "glomming" a big Python object.
[+] pdobsan|7 years ago|reply
There is already a well established Gnome project with the same name: http://www.glom.org It is a GTK+ front-end to PostgreSQL, similar to Microsoft Access.
[+] jasonpeacock|7 years ago|reply
I'm probably being dense, but I don't see a good description of the input data types supported - the CLI says "json or python".

It would be great to have clarification if this is JSON only, or supports other data structures, or parsers could be plugged in?

[+] mhashemi|7 years ago|reply
From within Python, all objects are supported by default. If you can parse it, you can glom it. You can even register additional behaviors for specific types to keep your specs tight: http://glom.readthedocs.io/en/latest/api.html#setup-and-regi... (example: http://glom.readthedocs.io/en/latest/snippets.html#automatic...)

The CLI is in a pretty preliminary state, usable but not as robust as it will be in a few weeks. It only supports built-in parsers (JSON and Python literals) What formats are you thinking? YAML?

[+] codezero|7 years ago|reply
This is pretty neat and is something I've been thinking about for a current project.

Does anyone know if something similar exists in Java/Scala land?

[+] zaptheimpaler|7 years ago|reply
I think this is similar to lenses in FP languages - check out Monocle for Scala.
[+] staticautomatic|7 years ago|reply
Aside: are there any libraries for JSON or dict-like formats with xpath-style querying that are as quick under the hood as lxml is for xml?