I've been testing polars and duckdb recently.. Polars is an excellent dataframe package.. extremely memory efficient and fast. I've experienced some issues with a hive-partitioning of a large S3 dataset which DuckDB doesn't have. I've scanned multi-TB S3 parquet datasets with DuckDB on my m1 laptop executing some really hairy SQL (stuff I didn't think it could handle).. window functions, joins to other parquet datasets just as large, etc. Very impressive software.
I haven't done the same types of things in Polars yet (simple selects).
theLiminator|2 years ago
I've actually personally found that DuckDB is tremendously slow against the cloud, though perhaps I'm going through the wrong API?
I'm using https://duckdb.org/docs/guides/import/s3_import.
My data is hive partitioned, when I monitor my network throughput, I only get a few MB/s with DuckDB but can achieve 1-2GB/s through polars.
Very possible it's a case of PEBKAC though.
cmollis|2 years ago