top | item 36575333

(no title)

mrtracy | 2 years ago

FoundationDB has, in my experience, always been well regarded in DB development circles; I think their test architecture - developed to easily reproduce rare concurrency failures - is its best legacy, as mentioned in a comment above and frequently before.

However, since these topics are always filled with effusive praise in the comments, let me give an example of a distributed scenario where FDB has shortcomings: OLTP SQL.

First, FDB is clearly designed for “read often, update rarely” workloads, in a relative sense. It produces multiple consistent replicas which are consistently queryable at a past time stamp, without a transaction - excellent for that profile. However, its transaction consistency method is both optimistic and centralized, and can lead to difficulty writing during high contention and (brief) system-wide transaction downtime if there is a failover; while it will work, it’s not optimal for “write often, read once” workloads.

Secondly, while it is an ordered key value store - facilitating building SQL on top of it - the popular thought of layering SQL on top of the distributed layer comes with many shortcomings.

My key example of this is schema changes. Optimistic application, and keeping schema information entirely “above” the transaction layer, can make it extremely slow to apply changes to large tables, and possibly require taking them partially offline during the update. There are ways to manage this, but online schema changes will be a competitive advantage for other systems.

Even for read-only queries, you lose opportunities to push many types of predicates down to the storage node, where they can be executed with fewer round trips. Depending on how distributed your system is, this could add up to significant additional latency.

Afaik, all of the spanner-likes of the world push significant schema-specific information into their transaction layers - and utilize pessimistic locking - to facilitate these scenarios with competitive performance.

For reasons like these, I think FDB will find (and has found) the most success in warehousing scenarios, where individual datum are queried often once written, and updates come in at a slower pace than the reads.

discuss

Dave_Rosenthal|2 years ago

I totally agree with your high level point that there isn't a great SQL (OLTP, or otherwise) layer for FoundationDB. Building something like this would be very hard--but I don't think the FoundationDB storage engine itself would end up inflicting the limitations you mention if it was well executed. And FoundationDB was specifically designed for real-time workloads with mixed reads/writes (i.e. the OLTP case).

Whether or not concurrency is optimistic (or done with locks, or whatever) doesn't really have a bearing on things. Any database is going to suffer if it has a bunch of updates to a specific hot keys that needs to be isolated (in the ACID sense). As long as your reads and writes are sufficiently spread out you'll avoid lock contention/optimistic transaction retries.

You speak to the real main limitation of FoundationDB when you talk about stuff like schema changes. There is a five-second transaction limit which in practice means that you cannot, for example, do a single giant transaction to change every row in a table. This was definitely a deliberate deliberate design choice, but not one without tradeoffs. The bad side is that if you want to be able to do something like this (lockout clients while you migrate a table) you need a different design that uses another strategy, like indirection. The good side is that screwed-up transactions that lock big chunks of your DB for a long time don't take down your system.

I find that the people who are relatively new to databases tend to wish that the five second limit was gone because it makes things simpler to code. People that are running them in production tend to like it more because it avoids a slew of production issues.

That said, I think for many situations a timeout like 30 or 60 seconds (with a warning at 10) would be a better operating point rather than the default 5 second cliff.

mrtracy|2 years ago

I think that the SQL-on-top, and optimistic model, are definitely things that can have a workflow-dependent performance impact and are relevant.

All databases do suffer under some red line of write contention; but optimistic databases will suffer more, and will start degrading at a lower level of contention. “Avoiding contention” is database optimization table stakes, and you should be structuring every schema you can to do so; but hot keys are almost inevitable when a certain class of real-time product scales, and they will show up in ways you do not expect. When it happens, you’d like your DBMS to give as much runway as possible before you have to make the tough changes to break through.

SQL-on-top becomes an issue for geographic distribution; without “pushing down” predicates, read-modify-write workloads, table joins, etc. on the client can incur significant round-trip time issuing queries. I think the lack of this is always going to present a persistent disadvantage vs selecting a competitor.

And again, given FDBs multiple-full-secondary model, it’s only a problem when working in real time, slower queries can work off a local secondary. But latest-data-latency is relevant for many applications.

aseipp|2 years ago

FWIW, I believe read transactions are unlimited in duration now that the Redwood engine has been available. But I haven't tested Redwood myself. Write transactions are still definitely limited to 5 seconds, though.

gregwebs|2 years ago

TiDB uses TiKV as an equivalent to foundationDB. It supports online migrations and pushing down read queries to the kv later. It also defaults to optimistic locking, but supports pessimistic. It also doesn’t have a five second rate transaction limit. a SQL layer on top of foundation DB could probably solve all these problems and it wouldn’t be novel.

mike_hearn|2 years ago

You can do online schema changes with FDB, it all depends on what you do with the FDB primitives.

A great example of how to best utilize FDB is Permazen [1], described well in its white paper [2].

Permazen is a Java library, so it can be utilized from any JVM language e.g. via Truffle you get Python, JavaScript, Ruby, WASM + any bytecode language. It supports any sorted K/V backend so you can build and test locally with a simple disk or in memory impl, or RocksDB, or even a regular SQL database. Then you can point it at FoundationDB later when you're ready for scaling.

Permazen is not a SQL implementation. Instead it's "language integrated" meaning you write queries using the Java collections library and some helpers, in particular, NavigableSet and NavigableMap. In effect you write and hard code your query plans. However, for this you get many of the same features an RDBMS would have and then some more, for example you get indexes, indexes with compound keys, strongly typed and enforced schemas with ONLINE updates, strong type safety during schema changes (which are allowed to be arbitrary), sophisticated transaction support, tight control over caching and transactional "copy out", watching fields or objects for changes, constraints and the equivalent of foreign key constraints with better validation semantics than what JPA or SQL gives you, you can define any custom data derivation function for new kinds of "index", a CLI for ad-hoc querying, and a GUI for exploration of the data.

Oh yes, it also has a Raft implementation, so if you want multi-cluster FDB with Raft-driven failover you could do that too (iirc, FDB doesn't have this out of the box).

And because the K/V format is stable, it has some helpers to write in memory stores to byte arrays and streams, so you can use it as a serialization format too.

FDB has something a bit like this in its Record layer, but it's nowhere near as powerful or well thought out. Permazen is obscure and not widely used, but it's been deployed to production as part of a large US 911 dispatching system and is maintained.

Incremental schema evolution is possible because Permazen stores schema data in the K/V store, along with a version for each persisted object (row), and upgrades objects on the fly when they're first accessed.

[1] https://permazen.io/

[2] https://cdn.jsdelivr.net/gh/permazen/permazen@master/permaze...

SamReidHughes|2 years ago

100%. I don't have the time to read the paper but online schema changes, with the ability to fail and abort the entire operation if one row is invalid, are basically the same problem as background index building.

If instead of using some generic K/V backend, it made use of specific FDB features, it might be even better. Conflict ranges and snapshot reads have been useful for me for some background index building designs, and atomic ops have their uses.

> Oh yes, it also has a Raft implementation, so if you want multi-cluster FDB with Raft-driven failover you could do that too (iirc, FDB doesn't have this out of the box).

I don't know what you mean by this. Multiple FDB clusters?

rdtsc|2 years ago

> First, FDB is clearly designed for “read often, update rarely” workloads, in a relative sense. It produces multiple consistent replicas which are consistently queryable at a past time stamp

For reading it has a 5 second snapshot timeout that gets in the way. One can stitch multiple transactions together but that could mean losing snapshot isolation without further tricks.

In other words, even just for read-mostly workloads it has a few warts.

preseinger|2 years ago

do you think the things you mention were deliberate design decisions?

mrtracy|2 years ago

They absolutely were, yes. There are very valuable application profiles where FoundationDB's design is excellent, and you can see that from its internal usage at large companies like Apple and Snowflake.

mike_hearn|2 years ago

Yes, one of the nice things about FDB is it has extensive design docs. Optimizing for reading more often than writing is obviously a pretty normal design choice, outside of log ingestion you'll normally be reading more than writing. There are people using FDB for logs (snowflake iirc?) and it's been optimized for that sort of use case more in recent years, but it's not like it was an unreasonable choice.