(no title)
awitt | 1 year ago
I would argue that by definition an LSM-tree buffers committed writes in memory, and that means you need a WAL for recovery.
If you are going to immediately flush the memtable then IO is on the critical path. And if you have fine grained updates you'll end up with lots of small files, which seems like a bad thing. It could be reasonable if you only receive batch updates.
7e|1 year ago
awitt|1 year ago
hodgesrm|1 year ago
This is true, but note that the WAL does not need to be in the database. You can use an event stream like Kafka and replay blocks of events in the event of a failure. ClickHouse has a feature to deduplicate blocks it has seen before, even if they land on a separate server in a cluster. You still need to store checksums of the previously seen blocks, which is what ClickHouse does. It does put the onus on users to regenerate blocks accurately but the overhead is far lower.
valyala|1 year ago
awitt|1 year ago
That doesn't make it wrong, or a bad architecture. It still takes ideas from LSM-trees, and it has similarities. But it can't be called an LSM-tree.