top | item 45997297

(no title)

glenjamin | 3 months ago

Other than motherduck, is anyone aware of any good models for running multi-user cloud-based duckdb?

ie. Running it like a normal database, and getting to take advantage of all of its goodies

discuss

For pure duckdb, you can put an Arrow Flight server in front of duckdb[0] or use the httpserver extension[1].

Where you store the .duckdb file will make a big difference in performance (e.g. S3 vs. Elastic File System).

But I'd take a good look at ducklake as a better multiplayer option. If you store `.parquet` files in blob storage, it will be slower than `.duckdb` on EFS, but if you have largish data, EFS gets expensive.

We[2] use DuckLake in our product and we've found a few ways to mitigate the performance hit. For example, we write all data into ducklake in blog storage, then create analytics tables and store them on faster storage (e.g. GCP Filestore). You can have multiple storage methods in the same DuckLake catalog, so this works nicely.

0 - https://www.definite.app/blog/duck-takes-flight

1 - https://github.com/Query-farm/httpserver

2 - https://www.definite.app/

anentropic|3 months ago

I wonder if anyone has experimented with "Mountpoint for S3" + DuckDB yet

https://docs.aws.amazon.com/AmazonS3/latest/userguide/mountp...

glenjamin|3 months ago

that looks neat - how but do you handle failover/restarts?

philbe77|3 months ago

GizmoSQL is definitely a good option. I work at GizmoData and maintain GizmoSQL. It is an Arrow Flight SQL server with DuckDB as a back-end SQL execution engine. It can support independent thread-safe concurrent sessions, has robust security, logging, token-based authentication, and more.

It also has a growing list of adapters - including: ODBC, JDBC, ADBC, dbt, SQLAlchemy, Metabase, Apache Superset and more.

We also just introduced a PySpark drop-in adapter - letting you run your Python Spark Dataframe workloads with GizmoSQL - for dramatic savings compared to Databricks for sub-5TB workloads.

Check it out at: https://gizmodata.com/gizmosql

Repo: https://github.com/gizmodata/gizmosql

philbe77|3 months ago

Oh, and GizmoData Cloud (SaaS option) is coming soon - to make it easier than ever to provision GizmoSQL instances...

tempest_|3 months ago

Feels like I keep seeing "Duckdb in your postgres" posts here. Likely that is what you want.

derekhecksher|3 months ago

https://github.com/gizmodata/gizmosql