top | item 46195734

(no title)

Would you mind sharing a motivating use case for those of us who don't think S3 is complicated or unreliable? Doesn't S3 already include HTTP upload capability? Are ML engineers really avoidant of basic operations like "HTTP retries and S3 multipart uploads"?

discuss

_ben_|2 months ago

Thanks for the question. You’re right that S3 itself is simple and reliable, and yes, most engineers *can* write HTTP retries and multipart uploads. EdgeMQ isn’t trying to replace S3’s API, it’s what you need around S3 when you have lots of producers on the public internet.

It gives you:

* edge HTTPS endpoints (auto-scale, multi-region HA) * a WAL so accepted events aren’t lost * segmentation + compression * explicit commit markers for consumers * backpressure instead of silent data loss * and a standardized way every team lands data in S3

You could build that yourself on top of S3; many companies do. EdgeMQ exists for folks who wants that behavior but dont want to operate a custom HTTP to S3 ingest service forever.

Its also worth noting that its in the early stages and the next features to be developed are transformations whereby you can input format a (say, JSON) and deliver in s3 as format b (e.g. csv, parquet etc).