Most people don't directly query or otherwise operate on raw CSV, though. Large source datasets in CSV format still reign in many enterprises, but these are typically read into a dataframe, manipulated and stored as Parquet and the like, then operated upon by DuckDB, Polars, etc., or modeled (E.g. DBT) and pushed to an OLAP target.
wenc|1 year ago
CSV is only good for append only.
But so is Parquet and if you can write Parquet from the get go, you save on storage as well has have a directly queryable column store from the start.
CSV still exists because of legacy data generating processes and dearth of Parquet familiarity among many software engineers. CSV is simple to generate and easy to troubleshoot without specialized tools (compared to Parquet which requires tools like Visidata). But you pay for it elsewhere.
fragmede|1 year ago
cmollis|1 year ago