For raw storage Hadoop beats an RDBMS, sure I'll buy that argument. It's not the same thing though and it doesn't do the same job.
Hadoop excels at data processing, trawling vast quantities of unstructured or semi structured data and extracting information from it. It's a poor platform for random access to specific elements of that data though.
RDBMS are great in exactly the places Hadoop isn't, getting access to random elements of data in a structured manner. Executing structured queries on that data. Things you know you'll do a lot of and can optimise.
There are column and table data stores built on top of Hadoop, and it can be argued that they could be used as an alternative to an RDBMS but they aren't drop in replacements and for the most part they're not meant to do the same job.
The most interesting uses of Hadoop aren't going to come from replacing existing RDBMS infrastructure with a Hadoop cluster. They're going to come from pushing data into a Hadoop cluster to process it. Collecting data that would otherwise be impossible to collect because it's either unstructured or there is simply too much to put in a RDBMS at a cost effective scale.
Hadoop and the NoSQL movement is exciting when you start to think about processing that data and pulling what's useful back out into your existing infrastructure.
Is Hadoop really approachable for most businesses who don't have a some sort of large-scale need for analytics?
Last time I looked there were things like Pig (http://pig.apache.org/) but the use case was "big data". Many businesses use RDBMs exclusively and can easily use analytics tools like Business Objects. Companies may be dealing with what they consider to be a lot of data but it pales in comparison to what many web startups are dealing with.
Well, Oracle is more prevalent in big enterprise. But if you look at the equations at the end of the article, a mysql/postgres DBA costs probably 80-90% of the Oracle DBA. The licensing costs are less, but the hardware cost for the big-iron and big-SAN to run the RDBMS is pretty much the same.
I don't believe MySQL and PostgreSQL have much to offer for working with great big heaps of archival, unstructured or semi-structured data. Assuming that is the case, then it's appropriate for the article to ignore that they exist.
[+] [-] andrewmccall|14 years ago|reply
Hadoop excels at data processing, trawling vast quantities of unstructured or semi structured data and extracting information from it. It's a poor platform for random access to specific elements of that data though.
RDBMS are great in exactly the places Hadoop isn't, getting access to random elements of data in a structured manner. Executing structured queries on that data. Things you know you'll do a lot of and can optimise.
There are column and table data stores built on top of Hadoop, and it can be argued that they could be used as an alternative to an RDBMS but they aren't drop in replacements and for the most part they're not meant to do the same job.
The most interesting uses of Hadoop aren't going to come from replacing existing RDBMS infrastructure with a Hadoop cluster. They're going to come from pushing data into a Hadoop cluster to process it. Collecting data that would otherwise be impossible to collect because it's either unstructured or there is simply too much to put in a RDBMS at a cost effective scale.
Hadoop and the NoSQL movement is exciting when you start to think about processing that data and pulling what's useful back out into your existing infrastructure.
[+] [-] BrianLy|14 years ago|reply
Last time I looked there were things like Pig (http://pig.apache.org/) but the use case was "big data". Many businesses use RDBMs exclusively and can easily use analytics tools like Business Objects. Companies may be dealing with what they consider to be a lot of data but it pales in comparison to what many web startups are dealing with.
[+] [-] nathanwdavis|14 years ago|reply
>Yahoo and Facebook are excellent examples of how Hadoop can scale up; but little is usually said about how Hadoop can scale the other way..
It says this right after mentioning that those operations have 5-digit sized hadoop clusters.
[+] [-] lmm|14 years ago|reply
[+] [-] zapman449|14 years ago|reply
[+] [-] bunderbunder|14 years ago|reply