top | item 39682576

(no title)

vegetable26 | 2 years ago

edit: author here

Our current FUSE implementation is specific to DuckDB and does make (as well as validate) some assumptions about DuckDB's access patterns to the underlying files. In this case - we know that currently DuckDB always truncate the WAL on successful checkpoint - which triggers a Differential Storage snapshot under the hood.

We are working on future-proofing this setup by removing our reliance on some of these assumptions and having our FUSE implementation act much more like a generic FS moving forward.

Though in general the truncation of the WAL is still usually a good time to snapshot the database, as the truncation means that the only "state" needed to reconstruct the current database is completely captured in the database file.

discuss

order

chrisjc|1 year ago

I'm not too familiar with FUSE, but I would imagine that you are doing something similar to registering a custom filesystem on duckdb, then intercepting certain filesystem activities to trigger all the magic described in the blog?

Also, do you think any of this is going to make its way back into the duckdb core, or perhaps even influencing the duckdb developers to make some of this native or easier (avoiding assumptions about what duckdb is doing)? Perhaps some kind of trigger on checkpoint/similar activities?

And btw, very interesting to read this announcement after reading through the S3 discussion yesterday.

https://news.ycombinator.com/item?id=39656657