top | item 46656671

(no title)

falconroar | 1 month ago

Interesting, I wasn't aware; thanks for that. I will say, Polars' implementation is much more centered on out-of-core processing, and bypasses some of DuckDB's limitations ("DuckDB cannot yet offload some complex intermediate aggregate states to disk"). Both incredible pieces of software.

To expand on this, Polars' `LazyFrame` implementation allows for simple addition of new backends like GPU, streaming, and now distributed computing (though it's currently locked to a vendor). The DuckDB codebase just doesn't have this flexibility, though there are ways to get it to run on GPU using external software.

discuss

order

steve_adams_86|1 month ago

Thanks for that insight as well! My needs don't tend to be so demanding so I've gotten away without knowing these details, but I suspect I the not-so-distant future this could be useful to know.

Being able to use distributed backends to process frames sounds kind of incredible, but I can't imagine my little projects ever making use of it. Still, very cool stuff.

noworriesnate|1 month ago

Have you seen Ibis[1]? It's a dataframe API that translates calls to it into various backends, including Polars and DuckDB. I've messed around with it a little for cases where data engineering transforms had to use pyspark but I wanted to do exploratory analysis in an environment that didn't have pyspark.

[1] https://ibis-project.org/