top | item 3275070

Canonical drops CouchDB because they were unable to make it scale up

110 points| patrickaljord | 14 years ago |linux.slashdot.org | reply

36 comments

order
[+] Argorak|14 years ago|reply
I had the chance to speak to one of the Couchbase guys about Ubuntu One-like systems at Couchconf, and gained some (limited) knowledge. Be aware that some of this is guesswork, as I don't know Ubuntu One all to well.

First things first: I think the failure of "really large systems" does not mean that the underlying technology is bad - most likely, it was a wrong pick for all the specific cases that this product needs. The number of variables is just so high that what looks like a good pick first, is bad in hindsight. As far as I know, Ubuntu One is one of the largest CouchDB based systems. (Zynga being the other)

As far as I learned, it is an Authorization/Authentication issue, more than a performance issue. So, the proposed solution for such systems is to use (at least) one CouchDB database per user - CouchDB supports authentication per Database, so this usage is perfectly fine. So each user gets n Databases that are named after some clever scheme (lets say "{username}/{contacts}", couchdb allows "/" in database names). Actually, as far as I learned, CouchDB handles this without major problems. The data model is also great: you can replicate data between all the users cell phones, desktops etc. just by replicating the correct database. So far, no problems.

The problem is sharing. So lets say, I want to share my business contacts with my co-founder. CouchDB only allows database-level authentication, so once I give my co-founder access, he will see all of my contacts. This includes the Replication API: once I have access, I can basically slurp the whole database (filters cannot be enforced). So, as you can manage a whole universe of databases, the solution here is simple: setup another database, say "{myuser}-{mycofounder}/shared_contacts", give both of us access and setup filtered push-replication in my database to the other database. So, now the source replicator can be trusted to be mine. So, suddenly, my nice "To the Cloud? Out of the door, left line, one database each"-system turns into a really big graph where every relationship between datasets is a database itself, along with many processes caring for moving data along those lines. Also, once my data is shared with my co-founder, its is basically public, as I readily copied it to him - deletion becomes a messy topic. (As long as the replication chain is intact, deletions are propagated, but honestly: who wants to support such a system?)

So, along those lines, one big problem becomes obvious: CouchDB does not support document-level authentication. Considering the data model of CouchDB (basically, views are aggregations of the global document store), this is also a hard thing to do, because it means that every view has to be filtered per user. On the upside: the Couchbase also said that they would really like to support it.

[+] skrebbel|14 years ago|reply
Cool story, and seems to make a lot of sense.

It brings up one thing I've been wondering about for a while already - I've always found that CouchDB's authentication feels more "bolted on" than anything else, and this is a nice use case where it doesn't fit.

I love Couch, but I'd have loved for the authentication scheme to be an entirely separate layer, more customizable and programmable, less "one per database, period.".

Such a design would probably cause all kinds of other problems again, though, but I wonder to what extent this has been thought through.

[+] mcs|14 years ago|reply
I'm curious if they tried to employ BigCouch, the dynamo-esque fork of CouchDB.
[+] mark_l_watson|14 years ago|reply
+1 - good idea. BigCouch is very nice, and based on Cloundant's experiences with it (they wrote it), it seems to scale to handle very large data customers.
[+] j45|14 years ago|reply
So.... old technologies suck because they're stable and scale, and new technologies suck because they're fun but aren't stable when they scale?
[+] mattadams|14 years ago|reply
This might sound like a big deal but it shouldn't be a headliner. Whatever the details (and we don't have many) companies use and drop technologies on a fairly regular basis. Sometimes it's a good fit and sometimes it's not. Obviously in this case Couch didn't do everything Canonical needed (I think someone else actually pointed out that Canonical mentioned that their needs were unique).

For every Canonical that drops Couch there will be 10s of other companies that adopt it because it's a good fit there. All this should reinforce is that every tool has a good fit and that smart implementors pick the one that jives best or moves to a better one when the opportunity presents itself.

[+] nirvana|14 years ago|reply
This article is a good example of how myths are created and engineering ignorance is perpetuated.

CouchDB doesn't "scale"? If you're trying to "scale" with it, you don't know what you're doing in the first place. CouchDB federates. That's a wholly different thing. And in terms of federated databases, I challenge anyone to come up with one as good or better than CouchDB. (And if you do, it will be news to me, and I'll thank you profusely!)

If its not obvious to you how to scale a federated database, then its not couchDB that can't scale, its you. (which is ok, everyone has to learn sometime, just don't put forth your lack of knowledge as proof of a weakness in an open source product!)

Further, rather than just saying "We've got this great new invention-- a better technology, and we're moving to that!" the message seems to be "we are just wanting to re-invent the wheel, so to justify it, we have to make a negative claim about couchDB.

Now, I expect some particular databases[1] fans to tell us, in the future, that "couchDB doesn't scale".

Ironically, they're punting on CouchDB to use, among other possibilities, SQLite. To claim that "Scaling" is the problem is .... bad engineering form.

CouchDB is great if you want to federate, have databases across the planet talking to each other and keeping in sync (its almost a turnkey CDN in a way), want to run a noSQL DB on a mobile device, etc.

MongoDB is great if you care about SQL and single node performance and its complex distribution mechanism works for you.

IF you want "scale" your choices are Riak or CouchDB-- for "scale" where homogenous distributed servers are the best solution.

And of course there's Cassandra and graph databases, etc. which provide different solutions to scalability.

IF you're serious about scalability, I strongly recommend people look at and choose Riak. I don't think anything out there touches it-- at least for the type of data I need. Cassandra and what I consider the "more complicated" alternatives might fit your particular problem type well. And if you think that its silly of me to recommend Riak then this is probably the case for you. But in terms of general databases, Riak seems to be pulling away from the pack. IF you're a fan of CouchDB, then BigCouch is a dynamo/Riak like version of it that I understand to be quite good. Plus, since its based on CouchDB, if the CouchDB way of doing queries (which is distinctly different from Riak) fits your way of working, then BigCouch deserves a look.

But please, don't ever say "couchDB doesn't scale". If you do, really its that you don't scale, CouchDB is fine.

[1] In an earlier edit I named a database. That was a mistake, not only is it bad form, I don't think that my characterization is appropriate at this time, as that database's fans are not as rabid as I imply. Apologies.

[+] patrickaljord|14 years ago|reply
> If you're trying to "scale" with it, you don't know what you're doing in the first place

They said they worked with the company behind CouchDB and were not able to make it scale. So while you might accuse Canonical of not knowing what they're doing, I doubt you could say the same of the founders of CouchDB. Here is the official announcement [1]:

> For the last three years we have worked with the company behind CouchDB to make it scale in the particular ways we need it to scale in our server environment. Our situation is rather unique, and we were unable to resolve some of the issues we came across. We were thus unable to make CouchDB scale up to the millions of users and databases we have in our datacentres, and furthermore we were unable to make it scale down to be a reasonable load on small client machines.

[1] https://lists.ubuntu.com/archives/ubuntu-desktop/2011-Novemb...

[+] route66|14 years ago|reply
No. It's not even an article, it's slashdot. In the post someone links to an article on some website which again links to the canonical mailing list. https://lists.ubuntu.com/archives/ubuntu-desktop/2011-Novemb...

There John Lenton lays out in not so many words that "for the last three years we have worked with the company behind CouchDB to make it scale in the particular ways we need it to scale in ourserver environment. Our situation is rather unique, and we were unable to resolve some of the issues we came across..."

This sounds like a fair assessment to me. No "myths created", no "ignorance perpetuated".

I would also say that for every kind of technology at some point it's fair to say that it does not scale or is not sufficient in other ways.

Or to say it with couchDB: relax!

[+] PanMan|14 years ago|reply
I have read quite some things on the different larger key-value stores, especially on how they scale. And what I have seen I really like Riak as well. However, we have been setting it up over the last few weeks, and sofar it's less stable than I have hoped/expected: we have had nodes crash for no apparent reason. I hope we can resolve them, as I really like the model, especially the horizontal scaling, but it must be stable to use...
[+] calpaterson|14 years ago|reply
> Further, rather than just saying "We've got this great new invention-- a better technology, and we're moving to that!" the message seems to be "we are just wanting to re-invent the wheel, so to justify it, we have to make a negative claim about couchDB.

Ubuntu re-invent a lot of wheels (sometimes poorly). Just off the top of my head: upstart, unity, launchpad...

[+] CPlatypus|14 years ago|reply
I get the impression that the problem was less about scaling within one database than about scaling across many; Lenton practically says as much about a half dozen messages down-thread. Deploying literally millions of separate databases would be a total nightmare of resource contention, version skew, and general administrative burden. U1 was probably looking for ways to use some sort of multi-tenancy to serve the same number of users with fewer CouchDB instances, and that's the kind of scalability they apparently found lacking. Your point about scale vs. federation, while perhaps accurate and valuable, doesn't seem to address the actual reason for this change.
[+] ecommando|14 years ago|reply
Another one bites the dust, another one bites the dust, bamp bamp, another one bites the dust.
[+] va_coder|14 years ago|reply
Mr. Ellison, you're the MS of databases. Don't rejoice too much.