(no title)
cuno
|
2 years ago
We and our customers use S3 as a POSIX filesystem, and we generally find it faster than a local filesystem for many benchmarks. For listing directories we find it faster than Lustre (a real high performance filesystem). Our approach is to first try listing directories with a single ListObjectV2 (which on AWS S3 is in lexicographic order) and if it hasn't made much progress, we start listing with parallel ListObjectV2. Once you start parallelising the ListObjectV2 (rather than sequentially "continuing") you get massive speedups.
crabbone|2 years ago
What did you measure? How did you compare? This claim seems very contrary to my experience and understanding of how things work...
Let me refine the question: did you measure metadata or data operations? What kind of storage medium is used by the filesystem you use? How much memory (and subsequently the filesystem cache) does your system have?
----
The thing is: you should expect, in the best case, something like 5 ms latency on network calls over the Internet in an ideal case. Within the datacenter, maybe you can achieve sub-ms latency, but that's hard. AWS within region but different zones tends to be around 1 ms latency.
This is while NVMe latency, even on consumer products, is 10-20 micro seconds. I.e. we are talking about roughly 100 times faster than anything going through the network can offer.
cuno|2 years ago
and in even more detail of different types of EBS/EFS/FSx Lustre here: https://cuno.io/blog/making-the-right-choice-comparing-the-c...
YZF|2 years ago
supriyo-biswas|2 years ago
How are you "parallelizing" the ListObjectsV2? The continuation token can be only fed in once the previous ListObjectsV2 response has completed, unless you know the name or structure of keys ahead of time, in which listing objects isn't necessary.
cuno|2 years ago
johnmaguire|2 years ago
fijiaarone|2 years ago
orf|2 years ago