top | item 38450774

(no title)

Sirupsen | 2 years ago

Most production storage systems/databases built on top of S3 spend a significant amount of effort building an SSD/memory caching tier to make them performant enough for production (e.g. on top of RocksDB). But it's not easy to keep it in sync with blob...

Even with the cache, the cold query latency lower-bound to S3 is subject to ~50ms roundtrips [0]. To build a performant system, you have to tightly control roundtrips. S3 Express changes that equation dramatically, as S3 Express approaches HDD random read speeds (single-digit ms), so we can build production systems that don't need an SSD cache—just the zero-copy, deserialized in-memory cache.

Many systems will probably continue to have an SSD cache (~100 us random reads), but now MVPs can be built without it, and cold query latency goes down dramatically. That's a big deal

We're currently building a vector database on top of object storage, so this is extremely timely for us... I hope GCS ships this ASAP. [1]

[0]: https://github.com/sirupsen/napkin-math [1]: https://turbopuffer.com/

discuss

jamesblonde|2 years ago

We built HopsFS-S3 [0] for exactly this problem, and have running it as part of Hopsworks now for a number of years. It's a network-aware, write-through cache for S3 with a HDFS API. Metadata operations are performed on HopsFS, so you don't have the other problems list max listing operations return 1000 files/dirs.

NVMe is what is changing the equation, not SSD. NVMe disks now have up to 8 GB/s, although the crap in the cloud providers barely goes to 2 GB/s - and only for expensive instances. So, instead of 40X better throughput than S3, we can get like 10X. Right now, these workloads are much better on-premises on the cheapest m.2 NVMe disks ($200 for 4TB with 4 GB/s read/write) backed by a S3 object store like Scality.

[0] https://www.hopsworks.ai/post/faster-than-aws-s3

dekhn|2 years ago

the numbers you're giving are throughput (byte/sec) not latency.

The comment you reply to is talking mostly about latency - reporting that S3 object get latencies (time to open the object and return its head) in the single-digits ms, where S3 was 50ms before.

BTW EBS can do 4GB/sec per volume. But you will pay for it.

kernelsanderz|2 years ago

Very excited about being able to build scalable vector databases on DiskANN like turbopuffer or lancedb. These changes in latency are game changing. The best server is no server. The capability a low latency vector database application that runs in lambda and S3 and is dirt cheap is pretty amazing.

seasily|2 years ago

The clear use case is serverless—without the complications of DynamoDB (expensive, 0.03/GB read), DynamoDB+DAX (VPC complications), or Redis (again, VPC requirements).

This instantly makes a number of applications able to run directly on S3, sans any caching system.

__turbobrew__|2 years ago

Similar to vector databases this could be really useful for hosting cloud optimized geotiffs for mapping purposes. At a previous job we were able to do on the fly tiling in about 100ms but with this new storage class you could probably make something that could tile just as fast or even faster than arcgis with all of its proprietary optimizations and goop.

Take it a step further, gdal supports s3 raster data sources out of the box for a while now. Any gdal powered system may be able to operate on s3 files as if they are local.