top | item 39659684

(no title)

cuno | 2 years ago

For AWS, we're comparing against filesystems in the datacenter - so EBS, EFS and FSx Lustre. Compared to these, you can see in the graphs where S3 is much faster for workloads with big files and small files: https://cuno.io/technology/

and in even more detail of different types of EBS/EFS/FSx Lustre here: https://cuno.io/blog/making-the-right-choice-comparing-the-c...

discuss

order

crabbone|2 years ago

The tests are very weird...

Normally, from someone working in the storage, you'd expect tests to be in IOPS, and the goto tool for reproducible tests is FIO. I mean, of course "reproducibility" is a very broad subject, but people are so used to this tool that they develop certain intuition and interpretation for it / its results.

On the other hand, seeing throughput figures is kinda... it tells you very little about how the system performs. Just to give you some reasons: a system can be configured to do compression or deduplication on client / server, and this will significantly impact your throughput, depending on what do you actually measure: the amount of useful information presented to the user or the amount of information transferred. Also throughput at the expense of higher latency may or may not be a good thing... Really, if you ask anyone who ever worked on a storage product about how they could crank up throughput numbers, they'd tell you: "write bigger blocks asynchronously". This is the basic recipe, if that's what you want. Whether this makes a good all around system or not... I'd say, probably not.

Of course, there are many other concerns. Data consistency is a big one, and this is a typical tradeoff when it comes to choosing between object store and a filesystem, since filesystem offers more data consistency guarantees, whereas object store can do certain things faster, while breaking them.

BTW, I don't think most readers would understand Lustre and similar to be the "local filesystem", since it operates over network and network performance will have a significant impact, of course, it will also put it in the same ballpark as other networked systems.

I'd also say that Ceph is kinda missing from this benchmark... Again, if we are talking about filesystem on top of object store, it's the prime example...

cuno|2 years ago

IOPS is a really lazy benchmark that we believe can greatly diverge from most real life workloads, except for truly random I/O in applications such as databases. For example, in Machine Learning, training usually consists of taking large datasets (sometimes many PBs in scale), randomly shuffling them each Epoch, and feeding them into the engine as fast as possible. Because of this, we see storage vendors for ML workloads concentrate on IOPS numbers. The GPUs however only really care about throughput. Indeed, we find a great many applications only really care about the throughput, and IOPS is only relevant if it helps to accomplish that throughput. For ML, we realised that the shuffling isn't actually random - there's no real reason for it to be random versus pseudo-random. And if its pseudo-random then it is predictable, and if its predictable then we can exploit that to great effect - yielding a 60x boost in throughput on S3, beating out a bunch of other solutions. S3 is not going to do great for truly random I/O, however, we find that most scientific, media and finance workloads are actually deterministic or semi-deterministic, and this is where cunoFS, by peering inside each process, can better predict intra-file and inter-file access patterns, so that we can hide the latencies present in S3. At the end of the day, the right benchmark is the one that reflects real world usage of applications, but that's a lot of effort to document one by one.

I agree that things like dedupe and compression can affect things, so in our large file benchmarks each file is actually random. The small file benchmarks aren't affected by "write bigger blocks" because there's nothing bigger than the file itself. Yes, data consistency can be an issue, and we've had to do all sorts of things to ensure POSIX consistency guarantees beyond what S3 (or compatible) can provide. These come with restrictions (such as on concurrent writes to the same file on multiple nodes), but so does NFS. In practice, we introduced a cunoFS Fusion mode that relies on a traditional high-IOPS filesystem for such workloads and consistency (automatically migrating data to that tier), and high throughput object for other workloads that don't need it.

hnlmorg|2 years ago

EFS is ridiculously slow though. Almost to the point where I fail to see how it’s actually useful for any of the traditional use cases for NFS.

geertj|2 years ago

> EFS is ridiculously slow though. Almost to the point where I fail to see how it’s actually useful for any of the traditional use cases for NFS.

Would you care to elaborate on your experience or use case a bit more? We've made a lot of improvements over the last few years (and are actively working on more), and we have many happy customers. I'd be happy to give a perspective of how well your use case would work with EFS.

Source: PMT turned engineer on EFS, with the team for over 6 years

dekhn|2 years ago

if you turn all the EFS performance knobs up (at a high cost), it's quite fast.

harshaw|2 years ago

Have you tried it recently? Because we've made it a lot faster over the years.

wenc|2 years ago

S3 is really high latency though. I store parquet files on S3 and querying them through DuckDB is much slower than file system because random access patterns. I can see S3 being decent if it’s bulk access but definitely not for random access.

This is why there’s a new S3 Express offering that is low latency (but costs more).