(no title)
mrtracy | 2 years ago
However, since these topics are always filled with effusive praise in the comments, let me give an example of a distributed scenario where FDB has shortcomings: OLTP SQL.
First, FDB is clearly designed for “read often, update rarely” workloads, in a relative sense. It produces multiple consistent replicas which are consistently queryable at a past time stamp, without a transaction - excellent for that profile. However, its transaction consistency method is both optimistic and centralized, and can lead to difficulty writing during high contention and (brief) system-wide transaction downtime if there is a failover; while it will work, it’s not optimal for “write often, read once” workloads.
Secondly, while it is an ordered key value store - facilitating building SQL on top of it - the popular thought of layering SQL on top of the distributed layer comes with many shortcomings.
My key example of this is schema changes. Optimistic application, and keeping schema information entirely “above” the transaction layer, can make it extremely slow to apply changes to large tables, and possibly require taking them partially offline during the update. There are ways to manage this, but online schema changes will be a competitive advantage for other systems.
Even for read-only queries, you lose opportunities to push many types of predicates down to the storage node, where they can be executed with fewer round trips. Depending on how distributed your system is, this could add up to significant additional latency.
Afaik, all of the spanner-likes of the world push significant schema-specific information into their transaction layers - and utilize pessimistic locking - to facilitate these scenarios with competitive performance.
For reasons like these, I think FDB will find (and has found) the most success in warehousing scenarios, where individual datum are queried often once written, and updates come in at a slower pace than the reads.
Dave_Rosenthal|2 years ago
Whether or not concurrency is optimistic (or done with locks, or whatever) doesn't really have a bearing on things. Any database is going to suffer if it has a bunch of updates to a specific hot keys that needs to be isolated (in the ACID sense). As long as your reads and writes are sufficiently spread out you'll avoid lock contention/optimistic transaction retries.
You speak to the real main limitation of FoundationDB when you talk about stuff like schema changes. There is a five-second transaction limit which in practice means that you cannot, for example, do a single giant transaction to change every row in a table. This was definitely a deliberate deliberate design choice, but not one without tradeoffs. The bad side is that if you want to be able to do something like this (lockout clients while you migrate a table) you need a different design that uses another strategy, like indirection. The good side is that screwed-up transactions that lock big chunks of your DB for a long time don't take down your system.
I find that the people who are relatively new to databases tend to wish that the five second limit was gone because it makes things simpler to code. People that are running them in production tend to like it more because it avoids a slew of production issues.
That said, I think for many situations a timeout like 30 or 60 seconds (with a warning at 10) would be a better operating point rather than the default 5 second cliff.
mrtracy|2 years ago
All databases do suffer under some red line of write contention; but optimistic databases will suffer more, and will start degrading at a lower level of contention. “Avoiding contention” is database optimization table stakes, and you should be structuring every schema you can to do so; but hot keys are almost inevitable when a certain class of real-time product scales, and they will show up in ways you do not expect. When it happens, you’d like your DBMS to give as much runway as possible before you have to make the tough changes to break through.
SQL-on-top becomes an issue for geographic distribution; without “pushing down” predicates, read-modify-write workloads, table joins, etc. on the client can incur significant round-trip time issuing queries. I think the lack of this is always going to present a persistent disadvantage vs selecting a competitor.
And again, given FDBs multiple-full-secondary model, it’s only a problem when working in real time, slower queries can work off a local secondary. But latest-data-latency is relevant for many applications.
aseipp|2 years ago
gregwebs|2 years ago
mike_hearn|2 years ago
A great example of how to best utilize FDB is Permazen [1], described well in its white paper [2].
Permazen is a Java library, so it can be utilized from any JVM language e.g. via Truffle you get Python, JavaScript, Ruby, WASM + any bytecode language. It supports any sorted K/V backend so you can build and test locally with a simple disk or in memory impl, or RocksDB, or even a regular SQL database. Then you can point it at FoundationDB later when you're ready for scaling.
Permazen is not a SQL implementation. Instead it's "language integrated" meaning you write queries using the Java collections library and some helpers, in particular, NavigableSet and NavigableMap. In effect you write and hard code your query plans. However, for this you get many of the same features an RDBMS would have and then some more, for example you get indexes, indexes with compound keys, strongly typed and enforced schemas with ONLINE updates, strong type safety during schema changes (which are allowed to be arbitrary), sophisticated transaction support, tight control over caching and transactional "copy out", watching fields or objects for changes, constraints and the equivalent of foreign key constraints with better validation semantics than what JPA or SQL gives you, you can define any custom data derivation function for new kinds of "index", a CLI for ad-hoc querying, and a GUI for exploration of the data.
Oh yes, it also has a Raft implementation, so if you want multi-cluster FDB with Raft-driven failover you could do that too (iirc, FDB doesn't have this out of the box).
And because the K/V format is stable, it has some helpers to write in memory stores to byte arrays and streams, so you can use it as a serialization format too.
FDB has something a bit like this in its Record layer, but it's nowhere near as powerful or well thought out. Permazen is obscure and not widely used, but it's been deployed to production as part of a large US 911 dispatching system and is maintained.
Incremental schema evolution is possible because Permazen stores schema data in the K/V store, along with a version for each persisted object (row), and upgrades objects on the fly when they're first accessed.
[1] https://permazen.io/
[2] https://cdn.jsdelivr.net/gh/permazen/permazen@master/permaze...
SamReidHughes|2 years ago
If instead of using some generic K/V backend, it made use of specific FDB features, it might be even better. Conflict ranges and snapshot reads have been useful for me for some background index building designs, and atomic ops have their uses.
> Oh yes, it also has a Raft implementation, so if you want multi-cluster FDB with Raft-driven failover you could do that too (iirc, FDB doesn't have this out of the box).
I don't know what you mean by this. Multiple FDB clusters?
rdtsc|2 years ago
For reading it has a 5 second snapshot timeout that gets in the way. One can stitch multiple transactions together but that could mean losing snapshot isolation without further tricks.
In other words, even just for read-mostly workloads it has a few warts.
preseinger|2 years ago
mrtracy|2 years ago
mike_hearn|2 years ago