top | item 40850661

(no title)

awitt | 1 year ago

LSM-trees do need a WAL. The entire idea of LSM-trees is that writes are buffered in memory and written out all at once. But a particular write doesn't wait for the memtable to be flushed. For that reason you still need a WAL (there is committed state in memory).

See https://research.facebook.com/publications/optimizing-space-... and https://www.cs.umb.edu/~poneil/lsmtree.pdf.

discuss

7e|1 year ago

Those implementations use a WAL, but it seems to be only as a performance optimization to decrease the size of the in-memory index; is there a theoretical reason one is needed? It looks equivalent to a WAL-less write path combined with an almost immediate compaction. If you remove the compaction and don’t delete the WAL it seems like you can eliminate that write amplification (at least temporarily).

awitt|1 year ago

The original purpose of an LSM-tree is to take I/O off the critical path of a write (there are other reasons to use them though, for example reducing space amplification).

I would argue that by definition an LSM-tree buffers committed writes in memory, and that means you need a WAL for recovery.

If you are going to immediately flush the memtable then IO is on the critical path. And if you have fine grained updates you'll end up with lots of small files, which seems like a bad thing. It could be reasonable if you only receive batch updates.