top | item 47135433

(no title)

soletta | 7 days ago

The usual path an engineer takes is to take a complex and slow system and reengineer it into something simple, fast, and wrong. But as far as I can tell from the description in the blog though, it actually works at scale! This feels like a free lunch and I’m wondering what the tradeoff is.

discuss

jrjeksjd8d|7 days ago

It seems like this is an approach that trades off scale and performance for operational simplicity. They say they only have 1GB of records and they can use a single committer to handle all requests. Failover happens by missing a compare-and-set so there's probably a second of latency to become leader?

This is not to say it's a bad system, but it's very precisely tailored for their needs. If you look at the original Kafka implementation, for instance, it was also very simple and targeted. As you bolt on more use cases and features you lose the simplicity to try and become all things to all people.

danhhz|7 days ago

(author here)

> It seems like this is an approach that trades off scale and performance for operational simplicity.

Yes, this is exactly it. Given that turbopuffer itself is built on the idea of object storage + stateless cache, we're all very comfortable dealing with it operationally. This design is enough for our needs and is much easier to be oncall for than adding an entirely new dependency would have been.

loevborg|7 days ago

> Failover happens by missing a compare-and-set so there's probably a second of latency to become leader?

Conceptually that makes sense. How complicated is it to implement this failover logic in a safe way? If there are two processes, competing for CAS wins, is there not a risk that both will think they're non-leaders and terminate themselves?

formerly_proven|7 days ago

Write amplification >9000 mostly

snowhale|7 days ago

[deleted]