(no title)
equark | 13 years ago
The median job size at Microsoft and Yahoo is only 15GB. And 90% of Hadoop jobs at Facebook are under 100GB. Clearly you want to be able to crunch large log files, but in terms of day-to-day analysis the files are much smaller than that. (cite: http://research.microsoft.com/pubs/163083/hotcbp12%20final.p...).
At Sense (http://www.senseplatform.com) most of the clients we work with are struggling not with the size of their data but with tricky modeling problems that don't fit into standard black boxes and with integrating analytics into actual production systems. Adopting something like Hadoop for these tasks is not very productive.
thebigpicture|13 years ago
I'll be keeping this pdf in my "rebuttals to idiocy" folder.
There are some industries that certainly have do have "big data" (Wikipedia has some definitions for "big data" that include size ranges for whatever that's worth) but it does not seem like companies with "big data" are the only targets of "big data" marketing. And from what I know about available solutions, if I really had a "big data" problem (e.g., 100 terabytes not 100 gigabytes) then I would not be choosing Hadoop. I also would not choose SQL or "NoSQL". But that's just me. Some of the best solutions I've found have nearly zero marketing. Go figure.
noelwelsh|13 years ago
equark|13 years ago
dbecker|13 years ago
I work for an analytics consulting company, and many of our clients want us to use Hadoop with their data. They've heard that Hadoop is the standard for big data, and they associate with "big data" with machine learning.
But the data they want us to put in Hadoop is usually small enough to work with in RAM on my laptop.
thebigpicture|13 years ago