top | item 39115506

(no title)

dtoma | 2 years ago

The "streaming systems" book answers your question and more: https://www.oreilly.com/library/view/streaming-systems/97814.... It gives you a history of how batch processing started with MapReduce, and how attempts at scaling by moving towards streaming systems gave us all the subsequent frameworks (Spark, Beam, etc.).

As for the framework called MapReduce, it isn't used much, but its descendant https://beam.apache.org very much is. Nowadays people often use "map reduce" as a shorthand for whatever batch processing system they're building on top of.

discuss

order

nwsm|2 years ago

This book looks interesting, should I buy it or does anyone else have newer recommendations? I have Designing Data-Intensive Applications which is a fantastic overview and still holds up well.

erikerikson|2 years ago

That was one of "the" books in the space prior to DDIA. In my opinion Akidao mixes the logic for processing events with the stream infrastructure implementation because he was writing from the context of his particular use cases. The time that I spoke with him it seemed that his influence had driven to the design of Google's systems and GCP such that they didn't properly prioritize ordering/linearity/consistency requirements. At this point my copy is of historic interest to me.

62951413|2 years ago

It has the most interesting/conceptual/detailed discussion of the streaming system semantics (e.g. interplay of windows and stateful stream operations) I'm aware of to this day. At least as far as Manning/O'Reilly-level books go. So I'd put it on the same bookshelf as DDIA.

It's a little biased towards Beam and away from Spark/Flink though. Which makes it less practical and more conceptual. So as long as it's your cup of tea go for it.

bk146|2 years ago

Thank you, I'll check this out!