top | item 14212211

Show HN: Infreqdb – S3 backed key/value database for infrequent read access

39 points| sajal83 | 9 years ago |github.com

28 comments

order
[+] spullara|9 years ago|reply
Be careful using S3 for lots of small writes. The price is utterly dominated by the PUT calls which cost $0.01/1000.
[+] saurik|9 years ago|reply
One day, many years ago, in a panic of "how the hell do I make this scale: I need it to work right now" I replaced an infrequently read data logging setup I had with S3... I ended up spending >$100k storing keys into that thing and momentarily was 2% of all objects stored in S3 (based on the stats they occasionally provided) before I eventually (two or three years later?) retired it; and to be extremely honest and to add insult to injury, my read access was so "infrequent" that I think I just never ended up reading it again after that day :/.
[+] sajal83|9 years ago|reply
Yes. I should include this info in the README calcs. For my usecase I intend to do 4 - 10 PUTs per hour, since they are batched. 10 PUTS/h = 7300 PUTs/month = $0.073/month. The key here is infrequent
[+] marknadal|9 years ago|reply
Stuff like this is great! And you said you mentioned you are primarily doing it for timeseries data - then you can easily batch writes to handle high volume throughput too!

We did this with S3 as the storage engine, 100M+ records for $10/day: https://www.youtube.com/watch?v=x_WqBuEA7s8

And discord had a very nice article on this as well: https://blog.discordapp.com/how-discord-stores-billions-of-m...

Great work, I think there is a lot of exciting stuff you can add to it!

[+] sajal83|9 years ago|reply
That's cool. Last week I tried googling for similar stuff but all I could find was people asking "How to run postgres on S3"...
[+] udkl|9 years ago|reply
How is this different than AWS Athena ? - https://aws.amazon.com/blogs/aws/amazon-athena-interactive-s...

Also why don't you dump the log data into a NoSQL like dynamoDB instead of S3 ?

[+] sajal83|9 years ago|reply
Athena looks cool. Didn't know about it. It probably describes what I'm trying to do.

> Also why don't you dump the log data into a NoSQL like dynamoDB instead of S3 ?

Price.

[+] derefr|9 years ago|reply
So, a database that's has a file format of write-once content-addressed "shards" and is persisted/distributed by "offlining" those shards into object storage, and then "onlining" them back into a given DB node's MRU cache at query time; and is mostly read-only, but can update by downloading the relevant shards, modifying them locally, re-hashing the modified shards to get new names for them, and then uploading them again under those new names.

Is this basically equivalent to Datomic, then? (Not that that's a bad thing. The world needs an open-source, non-JVM-targeted Datomic.)

[+] reubano|9 years ago|reply
> The world needs an open-source, non-JVM-targeted Datomic.)

Couchdb?

[+] ddxv|9 years ago|reply
What do you think about this being able to work with Gzip compressed JSONs stored on S3? Cool project, thank you.
[+] sajal83|9 years ago|reply
Thanks.

I think flat JSON files wont be efficient. My goal is to have the cache on disk, and each cached file would be big with lots of keys on it. In order to use JSON files, I would either have to keep the whole parsed data in memory, or parse the whole JSON each time I want to lookup a key.

If the data fits in memory then sure JSON is more convenient.

[+] TheDong|9 years ago|reply

[deleted]

[+] nunez|9 years ago|reply
This is terrible feedback. Have you used it? Do you plan on using it? If the answer to both of those questions is "no," then you (a) can't call it garbage, and (b) haven't provided anything useful. The whole point of showing something on HN is to get more feedback and, in turn, more users.