Would you mind sharing a motivating use case for those of us who don't think S3 is complicated or unreliable? Doesn't S3 already include HTTP upload capability? Are ML engineers really avoidant of basic operations like "HTTP retries and S3 multipart uploads"?
_ben_|2 months ago
It gives you:
* edge HTTPS endpoints (auto-scale, multi-region HA) * a WAL so accepted events aren’t lost * segmentation + compression * explicit commit markers for consumers * backpressure instead of silent data loss * and a standardized way every team lands data in S3
You could build that yourself on top of S3; many companies do. EdgeMQ exists for folks who wants that behavior but dont want to operate a custom HTTP to S3 ingest service forever.
Its also worth noting that its in the early stages and the next features to be developed are transformations whereby you can input format a (say, JSON) and deliver in s3 as format b (e.g. csv, parquet etc).