(no title)
nchammas | 2 years ago
https://docs.pola.rs/user-guide/migration/spark/
Look at the examples on this page of the Spark vs. Polars DataFrame APIs. (Disclaimer: I contributed this documentation. [1])
Having used SQL and Spark DataFrames heavily, but not Polars (or Pandas, for that matter), my impression is that Spark's DataFrame is analogous to SQL tables, whereas Polars's DataFrame is something a bit different, perhaps something closer to a matrix.
I'm not sure how else to explain these kinds of operations you can perform in Polars that just seem really weird coming from relational databases. I assume they are useful for something, but I'm not sure what. Perhaps machine learning?
tomtom1337|2 years ago
nchammas|2 years ago
Here's one of them:
In SQL and Spark DataFrames, it doesn't make sense to sort columns of the same table independently like this and then just juxtapose them together. It's in fact very awkward to do something like this with either of those interfaces, which you can see in the equivalent Spark code on that page. SQL will be similarly awkward.But in Polars (and maybe in Pandas too) you can do this easily, and I'm not sure why. There is something qualitatively different about the Polars DataFrame that makes this possible.