Ask HN: Why are there no open source NVMe-native key value stores in 2023?
99 points| nphase | 2 years ago
Why do you think that is? Are there possibly other projects out there that I'm not familiar with?
99 points| nphase | 2 years ago
Why do you think that is? Are there possibly other projects out there that I'm not familiar with?
[+] [-] diggan|2 years ago|reply
- https://github.com/DataManagementLab/ScaleStore - "A Fast and Cost-Efficient Storage Engine using DRAM, NVMe, and RDMA"
- https://github.com/unum-cloud/udisk (https://github.com/unum-cloud/ustore) - "The fastest ACID-transactional persisted Key-Value store designed for NVMe block-devices with GPU-acceleration and SPDK to bypass the Linux kernel."
- https://github.com/capsuleman/ssd-nvme-database - "Columnar database on SSD NVMe"
[+] [-] PaulHoule|2 years ago|reply
Also https://www.snia.org/sites/default/files/ESF/Key-Value-Stora...
[+] [-] jamesblonde|2 years ago|reply
[+] [-] ashvardanian|2 years ago|reply
[+] [-] geek_at|2 years ago|reply
[+] [-] formerly_proven|2 years ago|reply
[1] These slides claim up to 32 bytes, which would be a practically useful length: https://www.snia.org/sites/default/files/ESF/Key-Value-Stora... but the current revision of the standard only permits two 64-bit words as the key ("The maximum KV key size is 16 bytes"): https://nvmexpress.org/wp-content/uploads/NVM-Express-Key-Va...
[+] [-] londons_explore|2 years ago|reply
16 bytes is long enough that collisions will be super rare, and while you obviously need to write code to support that case, it should have no performance impact.
[+] [-] londons_explore|2 years ago|reply
If so, that is probably the reason for a 16 byte key - there is just no way anybody needs a key bigger than 16 bytes for an address anytime soon.
[+] [-] londons_explore|2 years ago|reply
[+] [-] jiggawatts|2 years ago|reply
The Azure Lv3/Lsv3/Lav3/Lasv3 series all provide this capability, for example.
Ref: https://learn.microsoft.com/en-us/azure/virtual-machines/las...
[+] [-] rwmj|2 years ago|reply
[+] [-] gavinray|2 years ago|reply
You might also be interested in xNVMe and the RocksDB/Ceph KV drivers:
https://github.com/OpenMPDK/xNVMe
https://github.com/OpenMPDK/KVSSD
https://github.com/OpenMPDK/KVRocks
[+] [-] nphase|2 years ago|reply
[+] [-] nerpderp82|2 years ago|reply
> NVMe SSDs based on flash are cheap and offer high throughput. Combining several of these devices into a single server enables 10 million I/O operations per second or more. Our experiments show that existing out-of-memory database systems and storage engines achieve only a fraction of this performance. In this work, we demonstrate that it is possible to close the performance gap between hardware and software through an I/O optimized storage engine design. In a heavy out-of-memory setting, where the dataset is 10 times larger than main memory, our system can achieve more than 1 million TPC-C transactions per second.
[0] https://news.ycombinator.com/item?id=37899886
[+] [-] threeseed|2 years ago|reply
[1] https://craillabs.github.io
[+] [-] nerpderp82|2 years ago|reply
https://github.com/aerospike/aerospike-server/blob/master/cf...
There are other occurrences in the codebase, but that is the most prominent one.
[+] [-] bestouff|2 years ago|reply
[+] [-] chaos_emergent|2 years ago|reply
I’m also curious if different and more performant data structures can leveraged; if so, there may be downstream improvements for garbage collection, retrieval, and request parallelism.
[+] [-] creshal|2 years ago|reply
But that's about it. And the latency is still worse than in-memory solutions.
Between that and the non-trivial effort needed to make this work in any sort of cloud setup (be it self-hosted k8s or AWS), it's a hard sell. If I really need latency above all, AWS gives me instances with 24TB RAM, and if I don't… why not just use existing kv-stores and accept the couple of ns extra latency?
[+] [-] threeseed|2 years ago|reply
[+] [-] di4na|2 years ago|reply
[+] [-] delfinom|2 years ago|reply
Given however, that most of the world has shifted to VMs, I don't think KV storage is accessible for that reason alone because the disks are often split out to multiple users. So the overall demand for this would be low.
[+] [-] londons_explore|2 years ago|reply
[+] [-] otterley|2 years ago|reply
[+] [-] infamouscow|2 years ago|reply
One thing they don't tell you about NVMe is you'll end up bottlenecked on CPU and memory bandwidth if you do it right. The problem is after eliminating all of the speed bumps in your IO pathway, you have a vertical performance mountain face to climb. People are just starting to run into these problems, so it's hard to say what the future holds. It's all very exciting.
[+] [-] caeril|2 years ago|reply
I like how you reference the performance benefits of NVMe direct addressing, but then immediately lament that you can't access these benefits across a SEVEN LAYER STACK OF ABSTRACTIONS.
You can either lament the dearth of userland direct-addressable performant software, OR lament the dearth of convenient network APIs that thrash your cache lines and dramatically increase your access latency.
You don't get to do both simultaneously.
Embedded is a feature for performance-aware software, not a bug.
[+] [-] rubiquity|2 years ago|reply
[+] [-] CubsFan1060|2 years ago|reply
Utilizing: https://memcached.org/blog/nvm-caching/,https://github.com/m...
TLDR; Grafana Cloud needed tons of Caching, and it was expensive. So they used extstore in memcache to hold most of it on NVMe disks. This massively reduced their costs.
[+] [-] thskman|2 years ago|reply
[deleted]
[+] [-] javierhonduco|2 years ago|reply
[+] [-] unknown|2 years ago|reply
[deleted]
[+] [-] Already__Taken|2 years ago|reply
[+] [-] espoal|2 years ago|reply
[+] [-] zupa-hu|2 years ago|reply
I mean, using a merkle tree or something like that to make sense of the underlying data.
[+] [-] dboreham|2 years ago|reply
(yes it's fashionable, but it's still terrible for random read performance)
[+] [-] znpy|2 years ago|reply