top | item 38655870

(no title)

exacube | 2 years ago

Nice article!

There is a bug in the compaction method:

  def compact(sstables, output_sstable):
    # Ordered by ascending key. pop() results in the item of  the smallest key.
    heap = heapq.heapify([(sstable.next(), sstable) for sstable in sstables])

    while (item, sstable) := heap.pop()
        if not item.is_tombstone():
            output_sstable.write(item)
You should only skip tombstones when you are compacting the final (i.e., largest) level, instead of between every level. Otherwise, an entry in a lower level will unmask itself because the tombstone in the upper level was compacted away.

It's one of the properties of LSM-based DBs that deletions/tombstones records linger for a long time, though some databases (eg RocksDB) put in some optimizations to get around this.

discuss

order

tontinton|2 years ago

You're right this was purposefully left out for brevity, in dbeel it is handled.

exacube|2 years ago

Ah, makes sense :)

vlovich123|2 years ago

What kind of optimizations does RocksDB do? I know they have some stuff around range deletions but not sure I’ve read anything about point deletions.