top | item 43520488

(no title)

pradeepchhetri | 11 months ago

I prefer to use clickhouse-local for all my CSV needs as I don't need to learn a new language (or cli flags) and can just leverage SQL.

    clickhouse local --file medias.csv --query "SELECT edito, count() AS count from table group by all order by count FORMAT PrettyCompact"

   ┌─edito──────┬─count─┐
   │ agence     │     1 │
   │ agrégateur │    10 │
   │ plateforme │    14 │
   │ individu   │    30 │
   │ media      │   423 │
   └────────────┴───────┘
With clickhouse-local, I can do lot more as I can leverage full power of clickhouse.

discuss

order

rixed|11 months ago

How does it compare with duckdb, which I usualy resort to? What I like with duckdb is that it's a single binary, no server needed, and it's been happy so far with all the CSV file I've thrown at it.

pradeepchhetri|11 months ago

clickhouse-local is similar to duckdb, you don't need a clickhouse-server running in order to use clickhouse-local. You just need to download the clickhouse binary and start using it.

  clickhouse local
  ClickHouse local version 25.4.1.1143 (official build).

  :)
There are few benefits of using clickhouse-local since ClickHouse can just do lot more than DuckDB. One such example is handling compressed files. ClickHouse can handle compressed files with formats ranging from zstd, lz4, snappy, gz, xz, bz2, zip, tar, 7zip.

  clickhouse local --query "SELECT count() FROM file('top-1m-2018-01-10.csv.zip :: *.csv')"
  1000000
Also clickhouse-local is much more efficient in handling big csv files[0]

[0]: https://www.vantage.sh/blog/clickhouse-local-vs-duckdb

sitkack|11 months ago

I use SQLite in a similar manner, but I'll have to check this out.