"No SQL. It’s 2012, and most queries are run from code rather than by a human sitting at a console. Why are we still querying our databases by constructing strings of code in a language most closely related to freaking COBOL, which after being constructed have to be parsed for every single query?
SQL in its natural habitat"
COBOL? Really? I don't see the COBOL connection at all.
SQL is more closely related to relational algebra, so it makes absolute sense when you're querying relational data.
So that's why many of us prefer to query our relational data in SQL. It's the same reason why we write stylesheets in CSS instead of C++... and why we validate postal codes and phone numbers with regular expressions instead of "manually" parsing them with PHP, Ruby, Python, or what have you. It's all about the right tool for the right job.
Relational data isn't the right solution for everything; there are lots of use cases where NoSQL databases absolutely rock and traditional databases are inappropriate. Again, right tool for the right job.
"No schemas. This was wonderful. What are schemas even for? They just make things hard to change for no reason."
While they do mention the need to enforce constraints on your data, it's comments like these that make me wish all application developers were required to work as a DBA for a few months.
A properly normalized and "constrained" database prevents data loss from stupid mistakes.
Seriously, after that line I gave up on reading the rest. There's plenty to be said about schema vs no schema but its pretty ignorant to just dismiss the entire concept out of hand.
Guess what - in lots of applications data integrity is more important than developer convenience.
Also, they should be forced to read Fabian Pascal's rants about the end of Western Civilization, aggressive ignorance, and lack of understanding of the Relational Model.
It seems like there is a huge amount of ignorance (on the parts of both developers and DBAs) about the fact that NoSQL doesn't necessarily mean "no schema." For whatever reason, the "cool kids" in the Web dev world completely ignore the ugly stepchildren of the NoSQL world, graph databases. Graph databases provide the protections of a schema, but the schema can be altered without massive pain on the parts of developers. Even better, the RDF style graph databases have a W3C standard query language called SPARQL.
Don't get me wrong, I love my RDBMS (MySQL and PostgreSQL) and I also use Mongo and love its extreme simplicity, but I wish more developers understood that schemas exist in places other than RDBMS.
'A properly normalized and "constrained" database prevents data loss from stupid mistakes.'
A properly written application layer also prevents data loss from stupid mistakes.
A stupid mistake made while setting up a properly normalized database also causes data loss.
You have to be very smart to be able to design a normalized constrained DB well. The fact that only smart people can do it doesn't mean that people who don't do it aren't smart.
What really don't understand is why they chose mysql over postgresql - postgres already have hstore as a column type to store schemaless data. This includes support for indexes on the field.
probably because as they stated they are familar with MySQL. After getting hurt trying out a new technology to them it makes sense they go back to something they know. Will probably lead to more uptime which is what they want.
"They don't exist, if you don't construct SQL queries by concatenating strings and variables."
My point is, people still do this. You never hear about REST-injection or memcached-injection attacks, even though those are possible in principle, because those protocols don't encourage this mistake the way using SQL as a database API does.
I was floored by this quote. As technical founder, one of my chief priorities operationally is ensuring that a) our systems are up and b) our data is consistent. I've dabbled in NoSQL (a la friendfeed) a bit and my head exploded when I wanted to migrate meta-schemas on multiple machines (potentially writing to the same data) with minimal concurrent alternate code paths. I need to be able to speak with 100% confidence that every object in our database is consistent and valid and without a schema on a database level this is really hard to enforce because anyone (including a bug) could accidentally throw a borked bag of properties into a key value store.
A lot of the excitement over schema-less data stores (I think) really comes down to instant DDL changes, which is why you should really just use Postgres instead of MySQL.
"If you don't like it, it must be because you haven't taken the time to understand it" is cognitive poison. What evidence will convince you that someone has understood well enough to judge that something doesn't make sense?
I've spent many years using schemas, and I know well how they work and what they achieve. I'm saying they're a lousy tradeoff.
Why did you use mysql rather than postgres? It seems like most of your complaints about mysql are solved in postgres (the query planner is much stronger) and there's some features that seems would fit your team much better
A number of us know MySQL fairly well, and in particular I've seen how it's used by some of the biggest internet companies. We have some postgres experience on our team as well, but it's a little more of an unknown. So experience trumped feature set in this case.
One thing I would say about postgres is that it has a lot of features. As a new user, it's hard to know which ones to use in which ways, and what the downsides might be.
To summarize some of the other (upvoted) comments on this cringe-worthy article:
Output to any external system must be encoded to prevent fill-in-the-blank injection, if it uses a language vs a string API-only approach. Used prepared statements.
SQL is not COBOL. Sets != ISAM.
You can store arbitrary data (XML, JSON) in BLOBS/CLOBS in an RDBMS. Denormalization is frowned upon, but not forbidden.
PostgreSQL is arguably a better free / open DBMS than MySQL.
DBMS data constraints are a good thing; use them when they make sense.
I've been using CouchDB for the past six months for the internal product-CMS for my business. Its got a great feature set for what I am using it for. MVCC architecture, master-master replication and stored views make it a natural fit as a backend for internal tools. The benefits actually grow as you get more desginers/artists etc working together; each on their own DB.
It seems like all the problems he has are with scaling. I can't comment on that, but I would whole-heartedly recommend it for internal tooling.
As a systems guy, I appreciate stories about developers learning that systems are complicated, and the latest and greatest technology is often not as stable or optimized as hoped.
Although, I do have high hopes for Key:Value store data repositories.
However as a developer guy I am glad they picked a technology that let them make quick prototyping progress and started shipping a product.
Remember only successful companies have to worry about scaling. Un(/not yet)-successful company have to worry about shipping first, then scaling. So perhaps Saucelabs reached that milestone, and they simply have outgrown the original technology that helped them ship.
Who knows, maybe if they had spent time re-implementing a REST interface on top of MySQL or re-implementing Futon for doing debugging, we might not even have heard about Saucelabs these days.
So, the TL;DR version of this is: we stopped using CouchDB because it sucked, but we took all the things that made it great, and shoe-horned them in to MySQL, while negating using anything that an RDBMS is designed for, like joins...
How do you go about searching the DB if everything is stored as a JSON object? Are you using an index like solr/sphinx instead of doing any searches directly in mysql?
We keep things we might need to search in regular columns (typically with indexes). The JSON object is just a way to add extra data to rows, which we can fetch and deal with on the app side. That works fine for a lot of things, and we're hoping it will give us some flexibility around when we need to do schema migrations in some cases.
"guesses wrong about how to perform queries all the time. Experienced MySQL users expect to write a lot of FORCE INDEX clauses."
This is most generally a sign of bad indexing/query construction. I've seen so many databases with dozens of indexs placed on tables (which only had a requirement for a few) because the developers just didn't grasp how they should be setup - which isn't rocket science.
This article explains why I am using MySQL as the DB backend for a site I am currently building. It also explains why I am not using Node.js. New technologies are fun to play with, but they get decidedly less fun when your site starts getting traffic and you realize that your new toy isn't ready for prime time.
It's important to be conservative in the right places. Data storage is one of those places: data is the lifeblood of most businesses, and once it goes bad there's often no way of making it good again. Making sure it's correct and remains correct is critical.
Programs, by comparison, are a lot more flexible here, unless you're in the financial/health/auto/aero industries.
As an aside, where I work we use a lot of node. It works.
It does nothing of the sort, node.js is ready for the prime time, just have a look at transloadit, voxer, yammer etc etc. Don't confuse the tool not being ready, with you not being ready to use the tool. Equally, there are plenty of people are making couchdb work for them.
The future of what? Non-relational data? Relational databases are very good for a wide variety of problems. And they will continue to be very good for a wide variety of problems.
I wish this article had some hard numbers for availability, performance, and the size of their data as opposed to hand-waving.
Shameless plug: If you're looking to benchmark or load test CouchDB a bit, I wrote one at https://github.com/mgp/iron-cushion. Hopefully someone out there will use this to decide if CouchDB's performance meets their needs, because migrating away from any database is painful...
More a lesson about the pitfalls of building systems using technologies that you don't understand very well than it is anything specific to either CouchDB or MySQL.
Notes' main problems are/were the lack of joins and most aggregates, lack of proper transactions, lack of indexes that would permit useful runtime queries, somewhat slow data access layer, and glacially slowly-evolving, ugly-ass UI.
It is also so easy to build on, that people who don't know what they're doing and shouldn't be building any software for redistribution, will do so anyway, and gain just enough success to become extremely annoying. Its IDE is about on par with most other IBM-involved software development exercises, i.e., fairly bad.
Now, all that aside, it is a very powerful, but not generally well-appreciated, system, and is great (fast, cheap, hard to kill) for a large swath of applications that don't require any of the stuff from the first paragraph.
Meh. Single threaded map reduce and very hungry disk usage. Mongo isn't the silver bullet a lot of supporters sometimes project it as (not saying you are).
Also, he actually linked 2 articles, I'd never seen the first one so it may very well be a hoax or whatever. But the 2nd one has made it's rounds a few times and has stood up (IMO) to scrutiny.
[+] [-] JohnBooty|14 years ago|reply
COBOL? Really? I don't see the COBOL connection at all.
SQL is more closely related to relational algebra, so it makes absolute sense when you're querying relational data.
So that's why many of us prefer to query our relational data in SQL. It's the same reason why we write stylesheets in CSS instead of C++... and why we validate postal codes and phone numbers with regular expressions instead of "manually" parsing them with PHP, Ruby, Python, or what have you. It's all about the right tool for the right job.
Relational data isn't the right solution for everything; there are lots of use cases where NoSQL databases absolutely rock and traditional databases are inappropriate. Again, right tool for the right job.
[+] [-] geophile|14 years ago|reply
[+] [-] sah|14 years ago|reply
[+] [-] johnbender|14 years ago|reply
While they do mention the need to enforce constraints on your data, it's comments like these that make me wish all application developers were required to work as a DBA for a few months.
A properly normalized and "constrained" database prevents data loss from stupid mistakes.
[+] [-] baconner|14 years ago|reply
Guess what - in lots of applications data integrity is more important than developer convenience.
[+] [-] SkyMarshal|14 years ago|reply
http://www.dbdebunk.com/
[+] [-] JPKab|14 years ago|reply
Don't get me wrong, I love my RDBMS (MySQL and PostgreSQL) and I also use Mongo and love its extreme simplicity, but I wish more developers understood that schemas exist in places other than RDBMS.
[+] [-] stephen|14 years ago|reply
http://draconianoverlord.com/2012/05/08/whats-wrong-with-a-s...
[+] [-] mattbriggs|14 years ago|reply
[+] [-] jmathes|14 years ago|reply
A properly written application layer also prevents data loss from stupid mistakes. A stupid mistake made while setting up a properly normalized database also causes data loss.
You have to be very smart to be able to design a normalized constrained DB well. The fact that only smart people can do it doesn't mean that people who don't do it aren't smart.
[+] [-] unknown|14 years ago|reply
[deleted]
[+] [-] yxhuvud|14 years ago|reply
[+] [-] derekorgan|14 years ago|reply
[+] [-] trimbo|14 years ago|reply
[+] [-] unknown|14 years ago|reply
[deleted]
[+] [-] DennisP|14 years ago|reply
I don't know about MySQL, but my database caches compiled queries.
"Things like SQL injection attacks simply should not exist."
They don't exist, if you don't construct SQL queries by concatenating strings and variables.
Meanwhile, all the cool kids are talking about getting rid of procedural code in favor of declarative DSLs...
[+] [-] bsg75|14 years ago|reply
[+] [-] sah|14 years ago|reply
My point is, people still do this. You never hear about REST-injection or memcached-injection attacks, even though those are possible in principle, because those protocols don't encourage this mistake the way using SQL as a database API does.
[+] [-] fusiongyro|14 years ago|reply
This attitude right here is why the RDBMS old guard despises NoSQL. Willful ignorance should not be celebrated.
[+] [-] newhouseb|14 years ago|reply
A lot of the excitement over schema-less data stores (I think) really comes down to instant DDL changes, which is why you should really just use Postgres instead of MySQL.
[+] [-] sah|14 years ago|reply
I've spent many years using schemas, and I know well how they work and what they achieve. I'm saying they're a lousy tradeoff.
[+] [-] kanja|14 years ago|reply
[+] [-] sah|14 years ago|reply
One thing I would say about postgres is that it has a lot of features. As a new user, it's hard to know which ones to use in which ways, and what the downsides might be.
[+] [-] Roboprog|14 years ago|reply
Output to any external system must be encoded to prevent fill-in-the-blank injection, if it uses a language vs a string API-only approach. Used prepared statements.
SQL is not COBOL. Sets != ISAM.
You can store arbitrary data (XML, JSON) in BLOBS/CLOBS in an RDBMS. Denormalization is frowned upon, but not forbidden.
PostgreSQL is arguably a better free / open DBMS than MySQL.
DBMS data constraints are a good thing; use them when they make sense.
[+] [-] Roboprog|14 years ago|reply
[+] [-] Androsynth|14 years ago|reply
It seems like all the problems he has are with scaling. I can't comment on that, but I would whole-heartedly recommend it for internal tooling.
[+] [-] josephkern|14 years ago|reply
Although, I do have high hopes for Key:Value store data repositories.
TANSTAAFL.
[+] [-] rdtsc|14 years ago|reply
Remember only successful companies have to worry about scaling. Un(/not yet)-successful company have to worry about shipping first, then scaling. So perhaps Saucelabs reached that milestone, and they simply have outgrown the original technology that helped them ship.
Who knows, maybe if they had spent time re-implementing a REST interface on top of MySQL or re-implementing Futon for doing debugging, we might not even have heard about Saucelabs these days.
[+] [-] empthought|14 years ago|reply
[+] [-] wave|14 years ago|reply
[+] [-] dpcx|14 years ago|reply
[+] [-] balac|14 years ago|reply
[+] [-] sah|14 years ago|reply
[+] [-] nirvana|14 years ago|reply
[+] [-] xd|14 years ago|reply
This is most generally a sign of bad indexing/query construction. I've seen so many databases with dozens of indexs placed on tables (which only had a requirement for a few) because the developers just didn't grasp how they should be setup - which isn't rocket science.
[+] [-] balloot|14 years ago|reply
[+] [-] dmpk2k|14 years ago|reply
Programs, by comparison, are a lot more flexible here, unless you're in the financial/health/auto/aero industries.
As an aside, where I work we use a lot of node. It works.
[+] [-] sandfox|14 years ago|reply
[+] [-] macspoofing|14 years ago|reply
The future of what? Non-relational data? Relational databases are very good for a wide variety of problems. And they will continue to be very good for a wide variety of problems.
[+] [-] shadowmatter|14 years ago|reply
Shameless plug: If you're looking to benchmark or load test CouchDB a bit, I wrote one at https://github.com/mgp/iron-cushion. Hopefully someone out there will use this to decide if CouchDB's performance meets their needs, because migrating away from any database is painful...
[+] [-] ams6110|14 years ago|reply
[+] [-] abraham|14 years ago|reply
[+] [-] StavrosK|14 years ago|reply
https://github.com/stochastic-technologies/goatfish
Not production code by any stretch, but an interesting concept, and I'd be more inclined to work on it if more people were using it.
[+] [-] NDizzle|14 years ago|reply
Yet in the same page makes fun of SQL for being old and busted? I don't get it.
[+] [-] sk5t|14 years ago|reply
It is also so easy to build on, that people who don't know what they're doing and shouldn't be building any software for redistribution, will do so anyway, and gain just enough success to become extremely annoying. Its IDE is about on par with most other IBM-involved software development exercises, i.e., fairly bad.
Now, all that aside, it is a very powerful, but not generally well-appreciated, system, and is great (fast, cheap, hard to kill) for a large swath of applications that don't require any of the stuff from the first paragraph.
[+] [-] vccarvalho|14 years ago|reply
[+] [-] encoderer|14 years ago|reply
Also, he actually linked 2 articles, I'd never seen the first one so it may very well be a hoax or whatever. But the 2nd one has made it's rounds a few times and has stood up (IMO) to scrutiny.