top | item 38166161

(no title)

indeedmug | 2 years ago

I am impressed that Polar is close to DuckDB near the top. It's surprising that a Python library would often out perform everything but DuckDB. DuckDB is very impressive but DataFrames and Python is too useful to give up on.

discuss

wenc|2 years ago

DuckDB interoperates with polars dataframes easily. I see DuckDB as a SQL engine for dataframes.

Any DuckDB result is easily converted to Pandas (by appending .df()) or Polars (by appending .pl()).

The conversion to polars is instantaneous because it’s zero copy because it all goes through Arrow in-memory format.

So I usually write complex queries in DuckDB SQL but if I need to manipulate it in polars I just convert it in my workflow midstream (only takes milliseconds) and then continue working with that in DuckDB. It’s seamless due to Apache Arrow.

https://duckdb.org/docs/guides/python/polars.html

indeedmug|2 years ago

Wow, what a cool workflow. I looks like the interop promise of Apache Arrow is real. It's a great thing when your computer works as fast as you think as opposed to sitting around waiting for queries to finish.

theLiminator|2 years ago

I mean polars is great, but there's nothing fundamentally impossible about polars providing similar performance to DuckDB, polars is written in rust, and really a lazy dataframe just provides an alternative frontend (sql being another frontend).

There's nothing in the architecture that would make it so that performance in one OLAP engine is fundamentally impossible to achieve in another.

indeedmug|2 years ago

I didn't know that Polars was implemented in Rust. In fact, it's very neat that Rust can interop with Python so cleanly, but that shouldn't be surprising since Python is basically a wrapper around C libraries.

But I still think it's surprising how much legs Python model of wrapping around C/C++/Rust libraries has. I would assume that if you have Python calling the libraries, you can't do lazy evaluation and thus you hit a wall such as Pandas.

But we seen with compiling Pytorch and Polars that you can have your cake and eat it too. Still have the ease of use of Python while having performance with enough engineering.

shpongled|2 years ago

Polars is written in Rust!

unknown|2 years ago

[deleted]