top | item 41185363

(no title)

raymondh | 1 year ago

This is an impressive post showing some nice investigative work that isolates a pain point and produces a performant work-around.

However, the conclusion is debatable. Not everyone has this problem. Not everyone would benefit from the same solution.

Sure, if your data can be loaded, manipulated, and summarized outside of Python land, then lazy object creation is a good way to go. But then you're giving up all of the Python tooling that likely drove you to Python in the first place.

Most of the Python ecosystem from sets and dicts to the standard library is focused on manipulating native Python objects. While the syntax supports method calls to data encapsulated elsewhere, it can be costly to constantly "box and unbox" data to move back and forth between the two worlds.

discuss

order

0x63_Problems|1 year ago

First off, thank you for all your contributions to Python!

I completely take your point that there are many places where this approach won't fit. It was a surprise for me to trace the performance issue to allocations and GC, specifically because it is rare.

WRT boxing and unboxing, I'd imagine it depends on access patterns primarily - given I was extracting a small portion of data from the AST only once each, it was a good fit. But I can imagine that the boxing and unboxing could be a net loss for more read-heavy use cases.

jhylton|1 year ago

You could create a custom C type that wrapped an arbitrary AST node and dynamically created values for attributes when you accessed them. The values would also be wrappers around the next AST node, and they could generate new AST nodes on writes. Python objects would be created on traversal, but each one would be smaller. It wouldn’t use Python lists to handle repeated fields It seems like a non-trivial implementation, but not fundamentally hard.

The analogy with numpy doesn’t seem quite right, as Raymond observes, because numpy depends on lots of builtin operations that operate on the underlying data representation. We don’t have any such code for the AST. You’ll still want to write Python code to traverse, inspect, and modify the AST.

coldtea|1 year ago

>However, the conclusion is debatable. Not everyone has this problem. Not everyone would benefit from the same solution.

Everyone would benefit from developers being more performance minded and not doing uneccesarry work though! Especially Python who has long suffered with performance issues.

Love your work btw!

BiteCode_dev|1 year ago

No. Days only 24h. If you focus on perfs, you leave something else.

Python is python because people cared about other things for many years.