top | item 43343526

(no title)

nickreese | 11 months ago

This is such a needed addition! Huge duckdb fan, congrats team!

discuss

I'm a bit out of the loop here, but what's the use case for DuckDB?

gnulinux|11 months ago

DuckDB is mind blowingly awesome. It is like SQLite, lightweight, embeddable, serverless, in-memory database, but it's optimized to be columnar (analytics optimized). It can work with files that are in filesystem, S3 etc without copying (it just looks at the necessary regions in the file) by just doing `select * from 's3://....something.parquet'`. It support automatic compression and automatic indexing. It can read json lines, parquet, CSV, its own db format, sqlite db, excel, Google Sheets... It has a very convenient SQL dialect with many QoL improvements and extensions (and is PostgreSQL compatible). Best of all: it's incredibly fast. Sometimes it's so fast that I find myself puzzled "how can it possibly analyze 10M rows in 0.1 seconds?" and I find it difficult to replicate the performance in pure Rust. It is an extremely useful tool. In the last year, it has become one of my use-everyday tools because the scope of problems you can just throw DuckDB at is gigantic. If you have a whole bunch of structured data that you want to learn something about, chances are DuckDB is the ideal tool.

PS: Not associated with DuckDB team at all, I just love DuckDB so much that I shill for them when I see them in HN.

simonw|11 months ago

On way to think about it is SQLite for columnar / analytical data.

It works great against local files, but my favorite DuckDB feature is that it can run queries against remote Parquet files, fetching just the ranges of bytes it needs to answer the query using HTTP range queries.

This means you can run eg a count(*) against a 15GB parquet file from your laptop and only fetch a few hundred KBs of data (if that).

alexpadula|11 months ago

Small intro, It's a relational database for analytical data primarily. It's an "in-process" database meaning you can import certain files at runtime and query them. That's how it differs primarily from regular relational systems.

csjh|11 months ago

for the average developer I think the killer feature is allowing you to query over whatever data you want (csv, json, parquet, even gsheets) as equals, directly from their file form - can even join across them