top | item 35017596

(no title)

cweill | 3 years ago

If you ever need to join two large dataframes, but are OOMing on the join, write them to disk as parquet files then use DuckDB to do the join. It's amazing what you can do on one machine thanks to DuckDB.

discuss

qolop|3 years ago

This isn't unique to duckdb. Almost all databases allow for sorting and joins of large tables that don't fit into memory.

cweill|3 years ago

Yes but if you're in a Jupyter notebook, you may not be directly connected to a DB. If you're using pandas, this unlocks some scalability before needing dask and a cluster.