If you ever need to join two large dataframes, but are OOMing on the join, write them to disk as parquet files then use DuckDB to do the join. It's amazing what you can do on one machine thanks to DuckDB.
Yes but if you're in a Jupyter notebook, you may not be directly connected to a DB. If you're using pandas, this unlocks some scalability before needing dask and a cluster.
qolop|3 years ago
cweill|3 years ago