MapReduce: A major step backwards?

[+] thomas11|15 years ago|reply

The whole piece seems to be based on a false premise, namely that MapReduce is supposed to replace databases. That's not the case at all, it's a way to analyze and transform data in parallel. Afterwards, you can load it into a relational (or other) database if you want database features.

Also, at least Hadoop offers a natural way of dealing with skew, partitioning: http://developer.yahoo.com/hadoop/tutorial/module5.html#part....

[+] gizzlon|15 years ago|reply

yeah, it doesn't make a lot of sense..

Guess the only valid point is 3: "Not novel at all — it represents a specific implementation of well known techniques developed nearly 25 years ago" Probably true..

[+] 1010011010|15 years ago|reply

"[Note: Although the system attributes this post to a single author, it was written by David J. DeWitt and Michael Stonebraker]"

I guess their schema doesn't handle multiple authors.

[+] rfugger|15 years ago|reply

Given the experimental evaluations to date, we have serious doubts about how well MapReduce applications can scale.

Umm... Google search?

[+] regularfry|15 years ago|reply

Yeah, that made me chuckle. Isn't the actual point, which the authors seem to at best skate over, that MapReduce scales (relatively) trivially to petabytes of data?

[+] vicaya|15 years ago|reply

The article is from 2008. Since then, the so called parallel DB goes no where, and Hadoop takes over the world.

The main problem of traditional (OLAP) DBMS in the era of big data is that ETL (Extract/Transform/Load) becomes the main bottle neck rather than complex queries, as big data is inherently semi-structured and noisy. MR is the tool to process big data.

[+] dxbydt|15 years ago|reply

http://www.computerworld.com/s/article/9142406/Big_three_dat...

"We'd never bring Hadoop code into one of our products," said Microsoft's David J. DeWitt. DeWitt is an academic expert in parallel SQL databases.

DeWitt says that in MapReduce "schema is buried" and furthermore, "the programmer must discover the structure by an examination of the code. Not only is this a very tedious exercise, but also the programmer must find the source code for the application."

Ever heard of Hive ? http://wiki.apache.org/hadoop/Hive/GettingStarted#SQL_Operat...

But he does make one important point - "whether a DBMS should be written: a. By stating what you want ( Relational DBMS ) b. By presenting an algorithm for data access ( Codasyl, MapReduce)

Well, mathematically speaking, 'a' wins hands down since you've normalized the data & have "no garbage in the data set" ( DeWitt's terminology ) . However, once you have a fast-enough access, potentially infinite memory to handle thousands of columns & millions of rows, then there's no reason not to atleast try to do a Codasyl. In that respect, Codasyl is like say a bubblesort. If you're going to be sorting atmost 10 elements a million times in your application, you would be better off with a quick and dirty bubblesort which actually performs faster in this particular case than a correctly written MergeSort which will do a lot better if you have a million elements, but will perform poorly if you have just 10 elements ( O(1n^2) = 100, O(510*ln(10)) = 115 ). When I first learnt DBMSs in school, the Professor actually made this very point - "someday we'll attempt a Codasyl, just not right now." Well, that day has come.

[+] geekzgalore|15 years ago|reply

Seems that the author needs to do some serious research and reading for MapReduce

[+] js4all|15 years ago|reply

Not well researched. For instance: Map-Reduce is not used to index data or to replace indexing. This is just one point. There are so many wrong assumptions in this article, that I don't know where to begin.

[+] peterbraden|15 years ago|reply

exactly, plus map-reduce can be combined with indexing - see CouchDB

[+] unknown|15 years ago|reply

[deleted]

[+] mxyzptlk|15 years ago|reply

We're using both. Vertica is amazingly fast. Hadoop helps us analyze some very big data sets. I wouldn't want to lose either one.

[+] Vitaly|15 years ago|reply

completely missing the point. map reduce didn't come to replace databases. it takes on tasks that databases are incapable of doing. google's search operation would be impossible to serve sanely with rdbms.

[+] chrisjsmith|15 years ago|reply

Looks like "enterprise company selling expensive black box trying to direct attention to their product" to me.

[+] werg|15 years ago|reply

[deleted]

[+] unknown|15 years ago|reply

[deleted]

17 comments