top | item 35823784

(no title)

manv1 | 2 years ago

I think the realtime requirement removes hadoop as an option. They might have considered using HDFS as the data store instead of S3, since putting lots of objects into s3 is expensive. Or just using a big EFS volume instead of S3.

It would be nice to know how much latency there was in the microservice version vs the monolithic version.

discuss

alpos|2 years ago

You never get "realtime" in data processing. Actual realtime systems are a totally different animal. Mostly done in the embedded space, the design of a realtime processing system involves setting up fixed time windows for each task that needs compute time and optimizing the code for each task until it fits into the time window for it, on every execution, every time. This is done in order to provide hard guarantees on how fast a system can respond to new data flowing in. It's usually only safety critical systems that actually have such responsiveness and delivery time constraints.

I point this out because how we talk about a problem determines what solutions we even acknowledge as being on the table here. Saying it's a realtime system when it isn't, or thinking we need realtime processing when we don't, makes people throw out solutions per-maturely, that the thrown out solutions are often right answers.

Once you acknowledge that your system will not be "realtime" and you actually don't have the time-boxing and specific time window delivery constraints that actual realtime problem spaces have, you can weigh all of your actual options with an eye for what will be fastest and most efficient given the budget and hardware you have to throw at this problem.