Sometimes a KV datastore is the right abstraction, though. Caching is an excellent example, but also for distributed session storage, configuration management, nonce enforcement, etc.
I sort of disagree. KV databases are so fundamental, that they are considered one of the most foundational tech to any advanced database management system.
Think of KV databases as a persistent associative mapping/hash map that needs to store data in a safe and secure way, then we can build advanced stuff on top of it. Take TiDB for example, it is a distributed database based on MySQL (its own query language can be considered as a subset of MySQL), but actually most of the heavylifting is handled by TiKV, which is a distributed KV datastore with Raft distributed consensus.
And then SurrealDB also leveraged TiKV to build their own graph-document hybrid database product...as one of the data transport. P.S.: used to be a contributor for SurrealDB.
KV databases are also the least efficient architecture possible if your data models or workloads are non-trivial. They are relatively simple to design and build, which is a positive attribute, but they are not that capable in any kind of theoretical sense. Other architectures preserve far more spatial and temporal locality when representing data models.
If your workload has even a whiff of analytics to it, operational or slow-time, KV databases are almost the pathological architecture in theory. Their intrinsically poor locality exacts a steep performance price.
These database architectures are all equivalent in the same sense that almost everything is a Turing Machine. Some manifestations and implementations are much more efficient than others in the real world. While I am not as emotionally invested in it as the article’s author seems to be, he is generally correct that KV databases have poor properties for most applications.
Deepseek just used FoundationDB to build a parallel filesystem. Parallel filesystem are a big deal -- their number, including proprietary ones, is probably in single digits.
Never believe that any tool is good or bad. That's always going to be a generalization, and therefore wrong. Learn as many tools as possible, and know which use cases they're good at and which use cases they're bad at. If someone implies the tool is bad for all use cases, know that we all live in our own bubbles and are ignorant about the plethora of other use cases that exist in the world.
IMO key value stores tend to live in the space between a third normal form ultra relational UML diagram database like the college textbooks assure you exist and a high chaos cowboy document storage system like mongodb.
They enable you to make a lot of things up as you go and iterate on your design. I like them because they remove a lot of ceremony around letting me get on with persisting things without having to ALTER TABLE or CREATE TABLE and all that entails. At the same time, they're constrained and often organized in a way that storing big ol' json blobs aren't. I like them for doing multiplayer gamedev things.
Is he right that this imaginary future product might be a better solution than current KV datastores? Maybe - he makes a good case, it certainly sounds worth pursuing, and he can point to a case where it sort of works well already (FoundationDB).
Doen that imply that you should give up on KV datastores today, when this product category he's asking for barely exists? No, obviously not.
Like most articles that make strong assertive statements like this, it's an oversimplification. Every tool has its place. The author clearly wants to use SQL, and seems to have a problem that would benefit from it, so they should use a SQL DB and not try to use a KV DB.
Yeah! Then I can stop grousing about yet another grammatical peeve: It's "key/value" pairs, not "key-value" pairs (which would mean pairs of key values).
I do like this approach in theory, but agreed I don't think that the devex has been solved and I don't know how many people would value a different approach as SQL feels like king in many places... FWIW it's kind of the approach that IndexedDB in the web takes, however that the API is quite bad IMO.
Or maybe there is a higher level DSL that you could apply to create query plans (something like MongoDB aggregation pipelines maybe?), but it quickly becomes basically the same as SQL.
It's a bit like the record layer in FoundationDB but more advanced. You specify query plans manually, so you can't accidentally forget an index for example.
Finally! Someone else reaching the conclusion that the query planner is really annoying and for most queries I would just like to skip it.
I don't want the dynamic nature of the planner. I don't want to send SQL over the wire, I want to send the already completed plan that I either generated or wrote by hand. So many annoying performance bugs are because the planner did the slow thing. Just let me write/adjust it.
This depends entirely on your use-case though right? It's not generic advice.
If your use-case is a data warehouse, then you absolutely want more than a K/V database and likely dynamic query plans because the point is dynamic usage. If your use-case is the serving frontend for a >1m request per second API, then sure, you probably don't want the complexity of a relational database and query planner.
Most things are somewhere in the middle and need to give serious consideration to this.
Let database enforce serialization format (JSON, BSON, MessagePack, protobuf.. anything really) + create and maintain indices, using this fancy crash-proof logic it has. That'll cover 95% of all my database needs.
(OP also asks for row-based layout, types, and non-trivial language. I think those parts are entirely optional)
Realistically, what should happen is reusable queries in the database with pre-cached plans as well as planner scripting and eliminating the index chooser for standard transactional queries. For real-time ad-hoc queries, the planners can be used, but for the ones happening 1000s of times a second... best to stick with a cached plan.
tyzoid|11 months ago
esafak|11 months ago
mjevans|11 months ago
I think I'd prefer to stop calling _large_ resources that are only K:V a 'database' though.
A 'database' shouldn't require SQL, but a distributed filesystem, however similar, isn't quite a database.
tptacek|11 months ago
stevefan1999|11 months ago
Think of KV databases as a persistent associative mapping/hash map that needs to store data in a safe and secure way, then we can build advanced stuff on top of it. Take TiDB for example, it is a distributed database based on MySQL (its own query language can be considered as a subset of MySQL), but actually most of the heavylifting is handled by TiKV, which is a distributed KV datastore with Raft distributed consensus.
And then SurrealDB also leveraged TiKV to build their own graph-document hybrid database product...as one of the data transport. P.S.: used to be a contributor for SurrealDB.
jandrewrogers|11 months ago
If your workload has even a whiff of analytics to it, operational or slow-time, KV databases are almost the pathological architecture in theory. Their intrinsically poor locality exacts a steep performance price.
These database architectures are all equivalent in the same sense that almost everything is a Turing Machine. Some manifestations and implementations are much more efficient than others in the real world. While I am not as emotionally invested in it as the article’s author seems to be, he is generally correct that KV databases have poor properties for most applications.
mrlongroots|11 months ago
fud101|11 months ago
notfed|11 months ago
MrLeap|11 months ago
IMO key value stores tend to live in the space between a third normal form ultra relational UML diagram database like the college textbooks assure you exist and a high chaos cowboy document storage system like mongodb.
They enable you to make a lot of things up as you go and iterate on your design. I like them because they remove a lot of ceremony around letting me get on with persisting things without having to ALTER TABLE or CREATE TABLE and all that entails. At the same time, they're constrained and often organized in a way that storing big ol' json blobs aren't. I like them for doing multiplayer gamedev things.
lmm|11 months ago
Doen that imply that you should give up on KV datastores today, when this product category he's asking for barely exists? No, obviously not.
treyd|11 months ago
technick|11 months ago
unknown|11 months ago
[deleted]
foldU|11 months ago
DidYaWipe|11 months ago
rockwotj|11 months ago
Or maybe there is a higher level DSL that you could apply to create query plans (something like MongoDB aggregation pipelines maybe?), but it quickly becomes basically the same as SQL.
mike_hearn|11 months ago
https://github.com/permazen/permazen/blob/master/README.md
It's a bit like the record layer in FoundationDB but more advanced. You specify query plans manually, so you can't accidentally forget an index for example.
osigurdson|11 months ago
I think it is because most people can make something work with SQL.
Spivak|11 months ago
I don't want the dynamic nature of the planner. I don't want to send SQL over the wire, I want to send the already completed plan that I either generated or wrote by hand. So many annoying performance bugs are because the planner did the slow thing. Just let me write/adjust it.
notadoomer236|11 months ago
danpalmer|11 months ago
If your use-case is a data warehouse, then you absolutely want more than a K/V database and likely dynamic query plans because the point is dynamic usage. If your use-case is the serving frontend for a >1m request per second API, then sure, you probably don't want the complexity of a relational database and query planner.
Most things are somewhere in the middle and need to give serious consideration to this.
theamk|11 months ago
Let database enforce serialization format (JSON, BSON, MessagePack, protobuf.. anything really) + create and maintain indices, using this fancy crash-proof logic it has. That'll cover 95% of all my database needs.
(OP also asks for row-based layout, types, and non-trivial language. I think those parts are entirely optional)
mamcx|11 months ago
In Fox, you write more or less `physical query plans` as syntax:
And what it make this even better, is that you can also write `SQL` so you can have the best of both worlds.BTW, I think this idea can be move even further and my take is at https://tablam.org
anon291|11 months ago
sroussey|11 months ago
exabrial|11 months ago