top | item 29966238

(no title)

mgradowski | 4 years ago

DuckDB and Polars are my bets in the Python data-wrangling space. I grew tired of Pandas' weird-ass API.

discuss

order

sweezyjeezy|4 years ago

I would love to switch to something else, but it feels like pandas is lingua-franca in data science now, to switch puts a burden on everyone else.

mgradowski|4 years ago

I like interface-only packages in the Julia ecosystem e.g. Tables.jl enables the development of several packages for querying tabular data that work across many concrete implementations; Plots.jl separates the high-level plotting interface from the plotting backend.

mrtranscendence|4 years ago

It's true. I've spent a small but nontrivial amount of time learning and using Polars, but it's just a nonstarter for most work projects. Not only does no one else know it exists, let alone how to use it, but it doesn't integrate with (to my knowledge) any ETL or ML Python library. You have to convert to pandas or NumPy, which is costly and to some extent defeats the purpose.

anonymousDan|4 years ago

Yes I used it for the first time in ages recently and I have to say I found the whole thing a mess. There are about 5 ways to do everything.

elforce002|4 years ago

I don't know DuckDB but polars could dethrone pandas. We're planning on using it to create our pipeline. Ibis-project is another solution if anyone wants to check it out.

mgradowski|4 years ago

Huh, even though I would prefer a universal SQL layer, ibis looks quite nice.

spaniard89277|4 years ago

I haven't touched pandas in months, but I also found quite tiring to deal with pandas.

Does your setup allow for an end-to-end solution? I mean, can I sink time into that setup and feel like I have everything I need to for regular data-wrangling?

I'm sure Pandas is amazing, but as a newbie I found myself doing many transformation logic with python data structures because it's just so much easier.

Maybe I'm dumb but going around the docs sometimes was like :/

closed|4 years ago

Author of the post and siuba here. I'm pretty interested in exploring supporting polars as a backend, and if it works well supporting versions of the SQL backends that translate to SQL based on the polars method API :).

(I haven't really used it, but it looks promising)

mrtranscendence|4 years ago

Hey, I love siuba. Haven't had a chance to use it much but it scratches an itch for me. For years I've grumbled about how Python isn't flexible enough to accommodate tidyverse style libraries, as it lacks pipes and lazy evaluation (or macros), but siuba has managed to be very nice to use.

Maybe someday Python'll get a macro system ...