It's Time to Stop Building KV Databases

tyzoid|11 months ago

Sometimes a KV datastore is the right abstraction, though. Caching is an excellent example, but also for distributed session storage, configuration management, nonce enforcement, etc.

esafak|11 months ago

KV stores can be used to build distributed relational databases.

mjevans|11 months ago

K:V - maps / dictionaries can be the correct tool for some jobs.

I think I'd prefer to stop calling _large_ resources that are only K:V a 'database' though.

A 'database' shouldn't require SQL, but a distributed filesystem, however similar, isn't quite a database.

tptacek|11 months ago

Rebuttal: a filesystem is a database.

stevefan1999|11 months ago

I sort of disagree. KV databases are so fundamental, that they are considered one of the most foundational tech to any advanced database management system.

Think of KV databases as a persistent associative mapping/hash map that needs to store data in a safe and secure way, then we can build advanced stuff on top of it. Take TiDB for example, it is a distributed database based on MySQL (its own query language can be considered as a subset of MySQL), but actually most of the heavylifting is handled by TiKV, which is a distributed KV datastore with Raft distributed consensus.

And then SurrealDB also leveraged TiKV to build their own graph-document hybrid database product...as one of the data transport. P.S.: used to be a contributor for SurrealDB.

jandrewrogers|11 months ago

KV databases are also the least efficient architecture possible if your data models or workloads are non-trivial. They are relatively simple to design and build, which is a positive attribute, but they are not that capable in any kind of theoretical sense. Other architectures preserve far more spatial and temporal locality when representing data models.

If your workload has even a whiff of analytics to it, operational or slow-time, KV databases are almost the pathological architecture in theory. Their intrinsically poor locality exacts a steep performance price.

These database architectures are all equivalent in the same sense that almost everything is a Turing Machine. Some manifestations and implementations are much more efficient than others in the real world. While I am not as emotionally invested in it as the article’s author seems to be, he is generally correct that KV databases have poor properties for most applications.

mrlongroots|11 months ago

Deepseek just used FoundationDB to build a parallel filesystem. Parallel filesystem are a big deal -- their number, including proprietary ones, is probably in single digits.

fud101|11 months ago

Is this a good take? I'm just a below average dev and trying to figure it out.

notfed|11 months ago

Never believe that any tool is good or bad. That's always going to be a generalization, and therefore wrong. Learn as many tools as possible, and know which use cases they're good at and which use cases they're bad at. If someone implies the tool is bad for all use cases, know that we all live in our own bubbles and are ignorant about the plethora of other use cases that exist in the world.

MrLeap|11 months ago

It's an opinion.

IMO key value stores tend to live in the space between a third normal form ultra relational UML diagram database like the college textbooks assure you exist and a high chaos cowboy document storage system like mongodb.

They enable you to make a lot of things up as you go and iterate on your design. I like them because they remove a lot of ceremony around letting me get on with persisting things without having to ALTER TABLE or CREATE TABLE and all that entails. At the same time, they're constrained and often organized in a way that storing big ol' json blobs aren't. I like them for doing multiplayer gamedev things.

lmm|11 months ago

Is he right that this imaginary future product might be a better solution than current KV datastores? Maybe - he makes a good case, it certainly sounds worth pursuing, and he can point to a case where it sort of works well already (FoundationDB).

Doen that imply that you should give up on KV datastores today, when this product category he's asking for barely exists? No, obviously not.

treyd|11 months ago

Like most articles that make strong assertive statements like this, it's an oversimplification. Every tool has its place. The author clearly wants to use SQL, and seems to have a problem that would benefit from it, so they should use a SQL DB and not try to use a KV DB.

technick|11 months ago

Very opinionated clickbait if you ask me.

unknown|11 months ago

[deleted]

foldU|11 months ago

Yes, it’s good

DidYaWipe|11 months ago

Yeah! Then I can stop grousing about yet another grammatical peeve: It's "key/value" pairs, not "key-value" pairs (which would mean pairs of key values).

rockwotj|11 months ago

I do like this approach in theory, but agreed I don't think that the devex has been solved and I don't know how many people would value a different approach as SQL feels like king in many places... FWIW it's kind of the approach that IndexedDB in the web takes, however that the API is quite bad IMO.

Or maybe there is a higher level DSL that you could apply to create query plans (something like MongoDB aggregation pipelines maybe?), but it quickly becomes basically the same as SQL.

mike_hearn|11 months ago

The author doesn't know it, but he's asking for Permazen - exactly what you're asking for:

https://github.com/permazen/permazen/blob/master/README.md

It's a bit like the record layer in FoundationDB but more advanced. You specify query plans manually, so you can't accidentally forget an index for example.

osigurdson|11 months ago

>> SQL feels like king in many places

I think it is because most people can make something work with SQL.

Spivak|11 months ago

Finally! Someone else reaching the conclusion that the query planner is really annoying and for most queries I would just like to skip it.

I don't want the dynamic nature of the planner. I don't want to send SQL over the wire, I want to send the already completed plan that I either generated or wrote by hand. So many annoying performance bugs are because the planner did the slow thing. Just let me write/adjust it.

notadoomer236|11 months ago

Amazon reached the same conclusion and widely prefer dynamodb for this reason

danpalmer|11 months ago

This depends entirely on your use-case though right? It's not generic advice.

If your use-case is a data warehouse, then you absolutely want more than a K/V database and likely dynamic query plans because the point is dynamic usage. If your use-case is the serving frontend for a >1m request per second API, then sure, you probably don't want the complexity of a relational database and query planner.

Most things are somewhere in the middle and need to give serious consideration to this.

theamk|11 months ago

All I want is K-V store with indexes.

Let database enforce serialization format (JSON, BSON, MessagePack, protobuf.. anything really) + create and maintain indices, using this fancy crash-proof logic it has. That'll cover 95% of all my database needs.

(OP also asks for row-based layout, types, and non-trivial language. I think those parts are entirely optional)

mamcx|11 months ago

Is weird, but what the OP was asking for, actually exist before in the case of FoxPro and similar tools.

In Fox, you write more or less `physical query plans` as syntax:

   USE customer  && Opens Customer table
   CLEAR
   SCAN FOR UPPER(country) = 'SWEDEN'
      ? contact, company, city
   ENDSCAN

And what it make this even better, is that you can also write `SQL` so you can have the best of both worlds.

BTW, I think this idea can be move even further and my take is at https://tablam.org

anon291|11 months ago

Realistically, what should happen is reusable queries in the database with pre-cached plans as well as planner scripting and eliminating the index chooser for standard transactional queries. For real-time ad-hoc queries, the planners can be used, but for the ones happening 1000s of times a second... best to stick with a cached plan.

sroussey|11 months ago

All you need is MemcacheD

exabrial|11 months ago

Honestly though, why not just take the existing SQL standard and trim it down then?

62 comments