top | item 43298732

(no title)

ritchie46 | 11 months ago

Disclosure, I am the author of Polars and this post. The difference with Ibis is that Polars cloud will also manage hardware. It is similar to Modal in that sense. You don't have to have a running cluster to fire a remote query.

The other is that we are only focussing on Polars and honor the Polars semantics and data model. Switching backends via Ibis doesn't honor this, as many architectures have different semantics regarding NaNs, missing data, order of them, decimal arithmetic behavior, regex engines, type upcasting, overflowing, etc.

And lastly, we will ensure it works seamlessly with the Polars landscape, that means that Polars Plugins and IO plugins will also be first class citizens.

discuss

order

TheTaytay|11 months ago

It’s funny you mention Modal. I use modal to do fan-out processing of large-ish datasets. Right now I store the transient data in duckdb on modal, using polars (and sometimes ibis) as my api of choice.

I did this, rather than use snowflake, because our custom python “user defined functions” that process the data are not deployable on snowflake out of the gate, and the ergonomics of shipping custom code to modal are great, so I’m willing to pay a bit more complexity to ship data to modal in exchange for these great dev ergonomics.

All of that is to say: what does it look like to have custom python code running on my polars cloud in a distributed fashion? Is that a solved problem?

ritchie46|11 months ago

Yes, you can run

`pc.remote(my_udf, schema)`

Where

`def my_udf() -> DataFrame`

We link the appropiate Python version at cluster startup.