top | item 4480997

(no title)

equark | 13 years ago

Another key fact is that "big data" is actually not that common, especially when it gets to the analysis stage.

The median job size at Microsoft and Yahoo is only 15GB. And 90% of Hadoop jobs at Facebook are under 100GB. Clearly you want to be able to crunch large log files, but in terms of day-to-day analysis the files are much smaller than that. (cite: http://research.microsoft.com/pubs/163083/hotcbp12%20final.p...).

At Sense (http://www.senseplatform.com) most of the clients we work with are struggling not with the size of their data but with tricky modeling problems that don't fit into standard black boxes and with integrating analytics into actual production systems. Adopting something like Hadoop for these tasks is not very productive.

discuss

order

thebigpicture|13 years ago

Thank you MS Research for a dose of sanity. "Big data" seems very potent as far as marketing buzzwords go. It plays on people's ignorance and the general sentiment of "too much information".

I'll be keeping this pdf in my "rebuttals to idiocy" folder.

There are some industries that certainly have do have "big data" (Wikipedia has some definitions for "big data" that include size ranges for whatever that's worth) but it does not seem like companies with "big data" are the only targets of "big data" marketing. And from what I know about available solutions, if I really had a "big data" problem (e.g., 100 terabytes not 100 gigabytes) then I would not be choosing Hadoop. I also would not choose SQL or "NoSQL". But that's just me. Some of the best solutions I've found have nearly zero marketing. Go figure.

noelwelsh|13 years ago

Interesting paper (and makes me feel more justified in rejecting Hadoop). Do you have any blog posts / other material about the techniques you're using at Sense?

equark|13 years ago

Unfortunately no, but you're welcome to email me at tristan@senseplatform.com.

dbecker|13 years ago

I think many people are confused about what "big data" means.

I work for an analytics consulting company, and many of our clients want us to use Hadoop with their data. They've heard that Hadoop is the standard for big data, and they associate with "big data" with machine learning.

But the data they want us to put in Hadoop is usually small enough to work with in RAM on my laptop.

thebigpicture|13 years ago

It must be great to have such naive clients.