The tech lead of the Google MapReduce team (which no longer exists) just received their award for turning down mapreduce. IIRC it was officially done 5 years ago. However I believe the code to delete MR was never checked in and I'm not sure if there are still users.
MapReduce was used at Google for highly inappropriate things. For example, the machine learning system I worked on, Sibyl https://www.datanami.com/2014/07/17/inside-sibyl-googles-mas... was implemented using mapreduce but there was no real technical justification for that- it's just that there was no other system that could scale to the volumes required or handle the constant failures endemic to GOogle's internal systems. It ended up requiring all sorts of heroic work to make MR scale, for example map-side combiners (which "reduced" items with common keys in the map output before it gets flushed to the shuffle files). All of this got replaced with TensorFlow and only the good bits of Sibyl were extracted to TFX.
MapReduce was deprecated because flume [0] the successor is better but it does practically the same thing and flume is used massively. I believe dataflow is the public google cloud version.
Yes, which is why it's amusing in hindsight that for a decade everyone* outside Google was forcing all* their distributed data tasks into the MapReduce paradigm, without considering alternative approaches like the one used by Spanner.
I'm not sure how you think a distributed data processing technology would "fake-out" other companies when building/choosing database technology. They are totally different problem sets.
MapReduce does not have a set in stone data source/sink and can use multiple things like bigtable and spanner so they are complementary technologies.
dekhn|4 years ago
MapReduce was used at Google for highly inappropriate things. For example, the machine learning system I worked on, Sibyl https://www.datanami.com/2014/07/17/inside-sibyl-googles-mas... was implemented using mapreduce but there was no real technical justification for that- it's just that there was no other system that could scale to the volumes required or handle the constant failures endemic to GOogle's internal systems. It ended up requiring all sorts of heroic work to make MR scale, for example map-side combiners (which "reduced" items with common keys in the map output before it gets flushed to the shuffle files). All of this got replaced with TensorFlow and only the good bits of Sibyl were extracted to TFX.
sokoloff|4 years ago
mrep|4 years ago
[0]: https://research.google/pubs/pub35650/
unknown|4 years ago
[deleted]
oblio|4 years ago
This sounds bad.
drewda|4 years ago
* slight exaggerations, I know
mrep|4 years ago
MapReduce does not have a set in stone data source/sink and can use multiple things like bigtable and spanner so they are complementary technologies.
quin3|4 years ago
LaserToy|4 years ago