For pure duckdb, you can put an Arrow Flight server in front of duckdb[0] or use the httpserver extension[1].
Where you store the .duckdb file will make a big difference in performance (e.g. S3 vs. Elastic File System).
But I'd take a good look at ducklake as a better multiplayer option. If you store `.parquet` files in blob storage, it will be slower than `.duckdb` on EFS, but if you have largish data, EFS gets expensive.
We[2] use DuckLake in our product and we've found a few ways to mitigate the performance hit. For example, we write all data into ducklake in blog storage, then create analytics tables and store them on faster storage (e.g. GCP Filestore). You can have multiple storage methods in the same DuckLake catalog, so this works nicely.
GizmoSQL is definitely a good option. I work at GizmoData and maintain GizmoSQL. It is an Arrow Flight SQL server with DuckDB as a back-end SQL execution engine. It can support independent thread-safe concurrent sessions, has robust security, logging, token-based authentication, and more.
It also has a growing list of adapters - including: ODBC, JDBC, ADBC, dbt, SQLAlchemy, Metabase, Apache Superset and more.
We also just introduced a PySpark drop-in adapter - letting you run your Python Spark Dataframe workloads with GizmoSQL - for dramatic savings compared to Databricks for sub-5TB workloads.
mritchie712|3 months ago
Where you store the .duckdb file will make a big difference in performance (e.g. S3 vs. Elastic File System).
But I'd take a good look at ducklake as a better multiplayer option. If you store `.parquet` files in blob storage, it will be slower than `.duckdb` on EFS, but if you have largish data, EFS gets expensive.
We[2] use DuckLake in our product and we've found a few ways to mitigate the performance hit. For example, we write all data into ducklake in blog storage, then create analytics tables and store them on faster storage (e.g. GCP Filestore). You can have multiple storage methods in the same DuckLake catalog, so this works nicely.
0 - https://www.definite.app/blog/duck-takes-flight
1 - https://github.com/Query-farm/httpserver
2 - https://www.definite.app/
anentropic|3 months ago
https://docs.aws.amazon.com/AmazonS3/latest/userguide/mountp...
glenjamin|3 months ago
philbe77|3 months ago
It also has a growing list of adapters - including: ODBC, JDBC, ADBC, dbt, SQLAlchemy, Metabase, Apache Superset and more.
We also just introduced a PySpark drop-in adapter - letting you run your Python Spark Dataframe workloads with GizmoSQL - for dramatic savings compared to Databricks for sub-5TB workloads.
Check it out at: https://gizmodata.com/gizmosql
Repo: https://github.com/gizmodata/gizmosql
philbe77|3 months ago
tempest_|3 months ago
derekhecksher|3 months ago