top | item 3954596

Goodbye, CouchDB

254 points| blanu | 14 years ago |saucelabs.com | reply

142 comments

order
[+] JohnBooty|14 years ago|reply
"No SQL. It’s 2012, and most queries are run from code rather than by a human sitting at a console. Why are we still querying our databases by constructing strings of code in a language most closely related to freaking COBOL, which after being constructed have to be parsed for every single query? SQL in its natural habitat"

COBOL? Really? I don't see the COBOL connection at all.

SQL is more closely related to relational algebra, so it makes absolute sense when you're querying relational data.

So that's why many of us prefer to query our relational data in SQL. It's the same reason why we write stylesheets in CSS instead of C++... and why we validate postal codes and phone numbers with regular expressions instead of "manually" parsing them with PHP, Ruby, Python, or what have you. It's all about the right tool for the right job.

Relational data isn't the right solution for everything; there are lots of use cases where NoSQL databases absolutely rock and traditional databases are inappropriate. Again, right tool for the right job.

[+] geophile|14 years ago|reply
The level of ignorance revealed by that one short quote is just stunning.
[+] sah|14 years ago|reply
The connection I see is the attempt to make the syntax English-like. Expressions like "SELECT * FROM users" remind me of COBOL's "ADD X TO Y".
[+] johnbender|14 years ago|reply
"No schemas. This was wonderful. What are schemas even for? They just make things hard to change for no reason."

While they do mention the need to enforce constraints on your data, it's comments like these that make me wish all application developers were required to work as a DBA for a few months.

A properly normalized and "constrained" database prevents data loss from stupid mistakes.

[+] baconner|14 years ago|reply
Seriously, after that line I gave up on reading the rest. There's plenty to be said about schema vs no schema but its pretty ignorant to just dismiss the entire concept out of hand.

Guess what - in lots of applications data integrity is more important than developer convenience.

[+] SkyMarshal|14 years ago|reply
Also, they should be forced to read Fabian Pascal's rants about the end of Western Civilization, aggressive ignorance, and lack of understanding of the Relational Model.

http://www.dbdebunk.com/

[+] JPKab|14 years ago|reply
It seems like there is a huge amount of ignorance (on the parts of both developers and DBAs) about the fact that NoSQL doesn't necessarily mean "no schema." For whatever reason, the "cool kids" in the Web dev world completely ignore the ugly stepchildren of the NoSQL world, graph databases. Graph databases provide the protections of a schema, but the schema can be altered without massive pain on the parts of developers. Even better, the RDF style graph databases have a W3C standard query language called SPARQL.

Don't get me wrong, I love my RDBMS (MySQL and PostgreSQL) and I also use Mongo and love its extreme simplicity, but I wish more developers understood that schemas exist in places other than RDBMS.

[+] mattbriggs|14 years ago|reply
"no schemas" means "schema in the application layer". sometimes its nice for the additional flexibility, but its never as reliable.
[+] jmathes|14 years ago|reply
'A properly normalized and "constrained" database prevents data loss from stupid mistakes.'

A properly written application layer also prevents data loss from stupid mistakes. A stupid mistake made while setting up a properly normalized database also causes data loss.

You have to be very smart to be able to design a normalized constrained DB well. The fact that only smart people can do it doesn't mean that people who don't do it aren't smart.

[+] yxhuvud|14 years ago|reply
What really don't understand is why they chose mysql over postgresql - postgres already have hstore as a column type to store schemaless data. This includes support for indexes on the field.
[+] derekorgan|14 years ago|reply
probably because as they stated they are familar with MySQL. After getting hurt trying out a new technology to them it makes sense they go back to something they know. Will probably lead to more uptime which is what they want.
[+] trimbo|14 years ago|reply
Except they didn't choose mysql -- they chose Percona. Bonus of going with Percona is that in the process you get a fantastic company to back you up.
[+] DennisP|14 years ago|reply
"constructing strings of code...which after being constructed have to be parsed for every single query?"

I don't know about MySQL, but my database caches compiled queries.

"Things like SQL injection attacks simply should not exist."

They don't exist, if you don't construct SQL queries by concatenating strings and variables.

Meanwhile, all the cool kids are talking about getting rid of procedural code in favor of declarative DSLs...

[+] bsg75|14 years ago|reply
The article sounds like they will be trading one set of problems for another, if they implement SQL on any RDBMS using those techniques.
[+] sah|14 years ago|reply
"They don't exist, if you don't construct SQL queries by concatenating strings and variables."

My point is, people still do this. You never hear about REST-injection or memcached-injection attacks, even though those are possible in principle, because those protocols don't encourage this mistake the way using SQL as a database API does.

[+] fusiongyro|14 years ago|reply
> What are schemas even for? They just make things hard to change for no reason.

This attitude right here is why the RDBMS old guard despises NoSQL. Willful ignorance should not be celebrated.

[+] newhouseb|14 years ago|reply
I was floored by this quote. As technical founder, one of my chief priorities operationally is ensuring that a) our systems are up and b) our data is consistent. I've dabbled in NoSQL (a la friendfeed) a bit and my head exploded when I wanted to migrate meta-schemas on multiple machines (potentially writing to the same data) with minimal concurrent alternate code paths. I need to be able to speak with 100% confidence that every object in our database is consistent and valid and without a schema on a database level this is really hard to enforce because anyone (including a bug) could accidentally throw a borked bag of properties into a key value store.

A lot of the excitement over schema-less data stores (I think) really comes down to instant DDL changes, which is why you should really just use Postgres instead of MySQL.

[+] sah|14 years ago|reply
"If you don't like it, it must be because you haven't taken the time to understand it" is cognitive poison. What evidence will convince you that someone has understood well enough to judge that something doesn't make sense?

I've spent many years using schemas, and I know well how they work and what they achieve. I'm saying they're a lousy tradeoff.

[+] kanja|14 years ago|reply
Why did you use mysql rather than postgres? It seems like most of your complaints about mysql are solved in postgres (the query planner is much stronger) and there's some features that seems would fit your team much better
[+] sah|14 years ago|reply
A number of us know MySQL fairly well, and in particular I've seen how it's used by some of the biggest internet companies. We have some postgres experience on our team as well, but it's a little more of an unknown. So experience trumped feature set in this case.

One thing I would say about postgres is that it has a lot of features. As a new user, it's hard to know which ones to use in which ways, and what the downsides might be.

[+] Roboprog|14 years ago|reply
To summarize some of the other (upvoted) comments on this cringe-worthy article:

Output to any external system must be encoded to prevent fill-in-the-blank injection, if it uses a language vs a string API-only approach. Used prepared statements.

SQL is not COBOL. Sets != ISAM.

You can store arbitrary data (XML, JSON) in BLOBS/CLOBS in an RDBMS. Denormalization is frowned upon, but not forbidden.

PostgreSQL is arguably a better free / open DBMS than MySQL.

DBMS data constraints are a good thing; use them when they make sense.

[+] Roboprog|14 years ago|reply
that should have been "strong API" (not string)
[+] Androsynth|14 years ago|reply
I've been using CouchDB for the past six months for the internal product-CMS for my business. Its got a great feature set for what I am using it for. MVCC architecture, master-master replication and stored views make it a natural fit as a backend for internal tools. The benefits actually grow as you get more desginers/artists etc working together; each on their own DB.

It seems like all the problems he has are with scaling. I can't comment on that, but I would whole-heartedly recommend it for internal tooling.

[+] josephkern|14 years ago|reply
As a systems guy, I appreciate stories about developers learning that systems are complicated, and the latest and greatest technology is often not as stable or optimized as hoped.

Although, I do have high hopes for Key:Value store data repositories.

TANSTAAFL.

[+] rdtsc|14 years ago|reply
However as a developer guy I am glad they picked a technology that let them make quick prototyping progress and started shipping a product.

Remember only successful companies have to worry about scaling. Un(/not yet)-successful company have to worry about shipping first, then scaling. So perhaps Saucelabs reached that milestone, and they simply have outgrown the original technology that helped them ship.

Who knows, maybe if they had spent time re-implementing a REST interface on top of MySQL or re-implementing Futon for doing debugging, we might not even have heard about Saucelabs these days.

[+] empthought|14 years ago|reply
Key-value stores like dbm since the 1980s? You're not exactly going out on a limb with those high hopes. :)
[+] dpcx|14 years ago|reply
So, the TL;DR version of this is: we stopped using CouchDB because it sucked, but we took all the things that made it great, and shoe-horned them in to MySQL, while negating using anything that an RDBMS is designed for, like joins...
[+] balac|14 years ago|reply
How do you go about searching the DB if everything is stored as a JSON object? Are you using an index like solr/sphinx instead of doing any searches directly in mysql?
[+] sah|14 years ago|reply
We keep things we might need to search in regular columns (typically with indexes). The JSON object is just a way to add extra data to rows, which we can fetch and deal with on the app side. That works fine for a lot of things, and we're hoping it will give us some flexibility around when we need to do schema migrations in some cases.
[+] nirvana|14 years ago|reply
Your DB should be cognizant of JSON. For instance, JSON records can be converted into key-value combinations that are then indexed.
[+] xd|14 years ago|reply
"guesses wrong about how to perform queries all the time. Experienced MySQL users expect to write a lot of FORCE INDEX clauses."

This is most generally a sign of bad indexing/query construction. I've seen so many databases with dozens of indexs placed on tables (which only had a requirement for a few) because the developers just didn't grasp how they should be setup - which isn't rocket science.

[+] balloot|14 years ago|reply
This article explains why I am using MySQL as the DB backend for a site I am currently building. It also explains why I am not using Node.js. New technologies are fun to play with, but they get decidedly less fun when your site starts getting traffic and you realize that your new toy isn't ready for prime time.
[+] dmpk2k|14 years ago|reply
It's important to be conservative in the right places. Data storage is one of those places: data is the lifeblood of most businesses, and once it goes bad there's often no way of making it good again. Making sure it's correct and remains correct is critical.

Programs, by comparison, are a lot more flexible here, unless you're in the financial/health/auto/aero industries.

As an aside, where I work we use a lot of node. It works.

[+] sandfox|14 years ago|reply
It does nothing of the sort, node.js is ready for the prime time, just have a look at transloadit, voxer, yammer etc etc. Don't confuse the tool not being ready, with you not being ready to use the tool. Equally, there are plenty of people are making couchdb work for them.
[+] macspoofing|14 years ago|reply
>We’re convinced that NoSQL is the future.

The future of what? Non-relational data? Relational databases are very good for a wide variety of problems. And they will continue to be very good for a wide variety of problems.

[+] shadowmatter|14 years ago|reply
I wish this article had some hard numbers for availability, performance, and the size of their data as opposed to hand-waving.

Shameless plug: If you're looking to benchmark or load test CouchDB a bit, I wrote one at https://github.com/mgp/iron-cushion. Hopefully someone out there will use this to decide if CouchDB's performance meets their needs, because migrating away from any database is painful...

[+] ams6110|14 years ago|reply
More a lesson about the pitfalls of building systems using technologies that you don't understand very well than it is anything specific to either CouchDB or MySQL.
[+] abraham|14 years ago|reply
More a lesson that requirements change overtime.
[+] NDizzle|14 years ago|reply
Sounds like he has Lotus Domino-like problems in a product that closely resembles Domino.

Yet in the same page makes fun of SQL for being old and busted? I don't get it.

[+] sk5t|14 years ago|reply
Notes' main problems are/were the lack of joins and most aggregates, lack of proper transactions, lack of indexes that would permit useful runtime queries, somewhat slow data access layer, and glacially slowly-evolving, ugly-ass UI.

It is also so easy to build on, that people who don't know what they're doing and shouldn't be building any software for redistribution, will do so anyway, and gain just enough success to become extremely annoying. Its IDE is about on par with most other IBM-involved software development exercises, i.e., fairly bad.

Now, all that aside, it is a very powerful, but not generally well-appreciated, system, and is great (fast, cheap, hard to kill) for a large swath of applications that don't require any of the stuff from the first paragraph.

[+] vccarvalho|14 years ago|reply
What I really don't get is not using mongo. Would be a natural fit, and besides isn't that link to a "Don't use mongo" you posted a known hoax?
[+] encoderer|14 years ago|reply
Meh. Single threaded map reduce and very hungry disk usage. Mongo isn't the silver bullet a lot of supporters sometimes project it as (not saying you are).

Also, he actually linked 2 articles, I'd never seen the first one so it may very well be a hoax or whatever. But the 2nd one has made it's rounds a few times and has stood up (IMO) to scrutiny.