(no title)
raymondh | 1 year ago
However, the conclusion is debatable. Not everyone has this problem. Not everyone would benefit from the same solution.
Sure, if your data can be loaded, manipulated, and summarized outside of Python land, then lazy object creation is a good way to go. But then you're giving up all of the Python tooling that likely drove you to Python in the first place.
Most of the Python ecosystem from sets and dicts to the standard library is focused on manipulating native Python objects. While the syntax supports method calls to data encapsulated elsewhere, it can be costly to constantly "box and unbox" data to move back and forth between the two worlds.
0x63_Problems|1 year ago
I completely take your point that there are many places where this approach won't fit. It was a surprise for me to trace the performance issue to allocations and GC, specifically because it is rare.
WRT boxing and unboxing, I'd imagine it depends on access patterns primarily - given I was extracting a small portion of data from the AST only once each, it was a good fit. But I can imagine that the boxing and unboxing could be a net loss for more read-heavy use cases.
jhylton|1 year ago
The analogy with numpy doesn’t seem quite right, as Raymond observes, because numpy depends on lots of builtin operations that operate on the underlying data representation. We don’t have any such code for the AST. You’ll still want to write Python code to traverse, inspect, and modify the AST.
coldtea|1 year ago
Everyone would benefit from developers being more performance minded and not doing uneccesarry work though! Especially Python who has long suffered with performance issues.
Love your work btw!
BiteCode_dev|1 year ago
Python is python because people cared about other things for many years.