I generally don't think about storage I/O speed at that scale (I mean really who does?). I once used a RAID0 to store data to HDDs faster, but that was a long time ago.
I would have naively guessed an interesting caching system, and to some degree tiers of storage for hot vs cold objects.
It was obvious after I read the article that parallelism was a great choice, but I definitely hadn't considered the detailed scheme of S3, or the error correction it used. Parallelism is the one word summary, but the details made the article worth reading. I bet minio also has a similar scaling story: parallelism.
AWS themselves have bragged that the biggest S3 buckets are striped across over 1 million hard drives. This doesn't mean they are using all of the space of all these drives, because one of the key concepts of S3 is to average IO of many customers over many drives.
RAID doesn’t exactly make writes faster, it can actually be slower. Depends on if you are using RAID for mirroring or sharding. When you mirror, writes are slower since you have to write to all disks.
I think the article's title question is a bit misleading because it focuses on peak throughput for S3 as a whole. The interesting question is "How can the throughput for a GET exceed the throughput of an HDD?"
If you just replicated, you could still get big throughput for S3 as a whole by doing many reads that target different HDDs. But you'd still be limited to max HDD throughput * number of GETs. S3 is not so limited, and that's interesting and non-obvious!
timeinput|5 months ago
I would have naively guessed an interesting caching system, and to some degree tiers of storage for hot vs cold objects.
It was obvious after I read the article that parallelism was a great choice, but I definitely hadn't considered the detailed scheme of S3, or the error correction it used. Parallelism is the one word summary, but the details made the article worth reading. I bet minio also has a similar scaling story: parallelism.
UltraSane|5 months ago
0x457|5 months ago
> I would have naively guessed an interesting caching system, and to some degree tiers of storage for hot vs cold objects.
Caching in this scenario usually done outside of S3 in something like Cloudfront
MrDarcy|5 months ago
zeroimpl|5 months ago
gregates|5 months ago
If you just replicated, you could still get big throughput for S3 as a whole by doing many reads that target different HDDs. But you'd still be limited to max HDD throughput * number of GETs. S3 is not so limited, and that's interesting and non-obvious!
UltraSane|5 months ago
3abiton|5 months ago
ffsm8|5 months ago
DoctorOW|5 months ago
It's not exactly rocket science.
crazygringo|5 months ago
Data gets split into redundant copies, and is rebalanced in response to hot spots.
Everything in this article is the obvious answer you'd expect.