One day, many years ago, in a panic of "how the hell do I make this scale: I need it to work right now" I replaced an infrequently read data logging setup I had with S3... I ended up spending >$100k storing keys into that thing and momentarily was 2% of all objects stored in S3 (based on the stats they occasionally provided) before I eventually (two or three years later?) retired it; and to be extremely honest and to add insult to injury, my read access was so "infrequent" that I think I just never ended up reading it again after that day :/.
Yes. I should include this info in the README calcs. For my usecase I intend to do 4 - 10 PUTs per hour, since they are batched. 10 PUTS/h = 7300 PUTs/month = $0.073/month. The key here is infrequent
Stuff like this is great! And you said you mentioned you are primarily doing it for timeseries data - then you can easily batch writes to handle high volume throughput too!
So, a database that's has a file format of write-once content-addressed "shards" and is persisted/distributed by "offlining" those shards into object storage, and then "onlining" them back into a given DB node's MRU cache at query time; and is mostly read-only, but can update by downloading the relevant shards, modifying them locally, re-hashing the modified shards to get new names for them, and then uploading them again under those new names.
Is this basically equivalent to Datomic, then? (Not that that's a bad thing. The world needs an open-source, non-JVM-targeted Datomic.)
I think flat JSON files wont be efficient. My goal is to have the cache on disk, and each cached file would be big with lots of keys on it. In order to use JSON files, I would either have to keep the whole parsed data in memory, or parse the whole JSON each time I want to lookup a key.
If the data fits in memory then sure JSON is more convenient.
This is terrible feedback. Have you used it? Do you plan on using it? If the answer to both of those questions is "no," then you (a) can't call it garbage, and (b) haven't provided anything useful. The whole point of showing something on HN is to get more feedback and, in turn, more users.
[+] [-] spullara|9 years ago|reply
[+] [-] saurik|9 years ago|reply
[+] [-] sajal83|9 years ago|reply
[+] [-] marknadal|9 years ago|reply
We did this with S3 as the storage engine, 100M+ records for $10/day: https://www.youtube.com/watch?v=x_WqBuEA7s8
And discord had a very nice article on this as well: https://blog.discordapp.com/how-discord-stores-billions-of-m...
Great work, I think there is a lot of exciting stuff you can add to it!
[+] [-] sajal83|9 years ago|reply
[+] [-] vtuulos|9 years ago|reply
[+] [-] rakoo|9 years ago|reply
[+] [-] sajal83|9 years ago|reply
[+] [-] udkl|9 years ago|reply
Also why don't you dump the log data into a NoSQL like dynamoDB instead of S3 ?
[+] [-] sajal83|9 years ago|reply
> Also why don't you dump the log data into a NoSQL like dynamoDB instead of S3 ?
Price.
[+] [-] derefr|9 years ago|reply
Is this basically equivalent to Datomic, then? (Not that that's a bad thing. The world needs an open-source, non-JVM-targeted Datomic.)
[+] [-] reubano|9 years ago|reply
Couchdb?
[+] [-] ddxv|9 years ago|reply
[+] [-] sajal83|9 years ago|reply
I think flat JSON files wont be efficient. My goal is to have the cache on disk, and each cached file would be big with lots of keys on it. In order to use JSON files, I would either have to keep the whole parsed data in memory, or parse the whole JSON each time I want to lookup a key.
If the data fits in memory then sure JSON is more convenient.
[+] [-] TheDong|9 years ago|reply
[deleted]
[+] [-] sctb|9 years ago|reply
[+] [-] nunez|9 years ago|reply