top | item 17025843

(no title)

MichaelGG | 7 years ago

Here's a version https://github.com/michaelgg/cidb -- Just some of the raw integer k-v storage part. It assumes you already have the hashed entries (you truncate them and the compression takes it from there). It is really what you should expect more from a college course IR project but since I never went to school... oh well.

I used this same library to encode telephone porting (LNP) instructions. That is a database of about 600M entries, mapping one phone number to another. With a bit of manipulation when creating the file, you go from 12GB+ naive encoding as strings (one client was using nearly 50GB after expanding it to a hashtable) to under a GB. Still better than any RMDBS can do and small enough to easily toss this in-RAM on every routing box.

Some day I'd like to write it in Rust and implement vectorized encoding and more compression schemes. Like an optimized SSTable just for integers.

discuss

order

No comments yet.