top | item 45696289

(no title)

vysakh0 | 4 months ago

Duckdb is an excellent OLAP db, I have had customers who had s3 data lake of parquet and use databricks or other expensive tool, when they could easily use duckdb.. Given we have cursor/claude code, it is not that hard for lot of use cases, I think the lack of documentation on how duckdb functions -- in terms of how it loads these files etc are some of the reasons companies are not even trying to adopt duckdb. I think blogs like this is a great testament for duckdb's performance!

discuss

order

adammarples|4 months ago

I have been playing today with ducklake, and I have to confess I don't quite get what it does that duckdb doesn't already do, if duckdb can just run on top of parquet files quite happily without this extension...

RobinL|4 months ago

It's main purpose is to solve the problem of upserts to a data lake, because upsert operations to file based data storage are a real pain.

mrtimo|4 months ago

I have experience with duckDB but not databricks... from the perspective of a company, is a tool like databricks more "secure" than duckdb? If my company adopts duckdb as a datalake, how do we secure it?

rapatel0|4 months ago

Duckdb can run as a local instance that points to parquet files in a n s3 bucket. So your "auth" can live on the layer that gives permissions to access that bucket.

lopatin|4 months ago

DuckDB is great but it’s barely OLAP right? A key part of OLAP is “online”. Since the writer process blocks any other processes from doing reads, calling it OLAP is a stretch I think.

ansgri|4 months ago

Isn't the Online part here about getting results immediately after query, as opposed to overnight batch reports? So if you don't completely overwhelm DuckDB with writes, it still qualifies. The quality you're describing is something like "realtime analytics", and is a whole another category: Clickhouse doesn't qualify (batching updates, merging etc. — but it's clearly OLAP), Druid does.