CouchDB 2.0 | WingNews

[+] matt4077|9 years ago|reply

Yet another product blog that doesn't manage to prominently link to the product.

If you want to make my life a tiny bit easier:

- blog.<project>.org needs a link to <project>.org because half of you visitors want to read the "it" before they read the "news".

- github.com/<you>/<Transmogrifier>For<ThatProject> should have a link to <ThatProject> in the description or the readme's first paragraph. I often come across interesting plugins without knowing anything about projects' they are for.

(and yes, I know there are actual problems in the world. But these are easy to solve)

[+] shash7|9 years ago|reply

So true. Half of the time when I find myself on a engineering blog for a startup, I am curious about their product but there is no link to their main page. Clicking on the logo redirects me to blog.<startup>.com

[+] jonursenbach|9 years ago|reply

There's an "About" link on the blog that literally explains what it is.

[+] janl|9 years ago|reply

Except, you know, for the “Download” headline that I specifically put in for you ;)

[+] happytrails|9 years ago|reply

I agree, but I imagine your blood pressure is through the roof with such a critique.

[+] SkyMarshal|9 years ago|reply

Why is this the top post

[+] k__|9 years ago|reply

TL;DR

- Clustering https://blog.couchdb.org/2016/08/01/couchdb-2-0-architecture...

- New Query Language https://blog.couchdb.org/2016/08/03/feature-mango-query/

- New Admin Interface (written in React) https://blog.couchdb.org/2016/07/27/fauxton-the-new-couchdb-...

[+] cbHXBY1D|9 years ago|reply

Also, 2.0 is the unification of BigCouch (Cloudant's work) with the old single node CouchDB.

[+] fiatjaf|9 years ago|reply

As an low-profile developer who's been using CouchDB for a long time, some weeks ago I've written some quick personal opinions on what CouchDB has become: http://fiatjaf.on.flowi.es/about-couchdb/

[+] jchrisa|9 years ago|reply

Early CouchDB contributor here. I agree with much of your write up, but I think the picture for apps built around replication is brighter than it seems. PouchDB has crazy momentum. Couchbase Mobile is getting baked into the next generation of infrastructure at places like GE and big airlines.

If you want filtered replication, we designed Couchbase Sync Gateway because we thought db-per-user was to heavy. What's fun is to think about the options for mix-and-match across the stack.

[+] drhayes9|9 years ago|reply

I really like your write-up.

I'm sad that the Couchapp thing never took off or is getting de-emphasized. That was one of the really brain-bendy ideas about Couch that I loved, that these DB-side applications could also replicate to other people.

I'm only just catching up with Couch and finding the db-per-user stuff. Do you know of anyone shipping a Couch instance inside thick clients, like a traditional desktop application?

[+] pokstad|9 years ago|reply

My thoughts exactly. I'm not sure who CouchDB is supposed to be for anymore. The original developers who fell in love with it are moving on. Who is the audience for 2.0? Who does IBM hope to reach with Cloudant?

BTW, you are spot on about continuous replication. Try using continuous replication on Cloudant and you will end up with a fat bill.

[+] willholley|9 years ago|reply

I think there's a lot for you to like in 2.0.

> The replication protocol, which supports multi-master, has changed little

Basically true, though with many interoperating, independent datastores it's a tricky thing to evolve. 2.0 adds an additional endpoint, _bulk_get which can significantly reduce the number of requests when CouchDB is paired with an on-device database such as PouchDB, Cloudant Sync or Couchbase Lite (the endpoint was inspired by the same feature in Couchbase). The CouchDB replicator itself has had many performance and stability improvements [1] and continues to be a significant focus for active development.

Also, CouchDB 2.0 introduces internal cluster replication using distributed Erlang. If you currently use CouchDB replication to bi-directionally replicate between machines on the same network for HA, replacing those with a CouchDB 2.0 cluster should be a big win.

> In other words: everybody seem to be looking at CouchDB as just a very poor and limited MongoDB.

Query doesn't pretend to be MongoDB-compatible - it provides a syntax that should be familiar to MongoDB users and more query flexibility than views allow. I think Query still has a fair way to go - this is the first release - but it's a move in the right direction.

As to whether CouchDB is viewed as "a very poor and limited MongoDB", they are very different databases. CouchDB is a good choice if you want a rock-solid JSON datastore which comfortably scales up to multiple TBs / many machines, with multi-master replication over unreliable networks. Query support, as you say, is not as rich as some other databases, so if that's more important to you, there are probably better options.

> Filtered replication was implemented, but it is slow to the point that no one recommends that you use them.

The new _selector filter [[2] in CouchDB 2.0 offers a significant performance improvement for filtered _changes. It should be a small change for replicators such as PouchDB can take advantage of this.

> About Couchapps, the special database features that powered them in the first place were left aside

I don't speak for the project, but it seems there has been much debate about this in the CouchDB community and the conclusion was that there are better solutions to most Couchapp-shaped problems than running application logic in the database. The features that combine to enable Couchapps haven't gone away and will benefit from the general improvements in 2.0, but they haven't been explicitly developed.

[1] https://blog.couchdb.org/2016/08/15/feature-replication/ [2] http://docs.couchdb.org/en/2.0.0/api/database/changes.html#s...

[+] SEJeff|9 years ago|reply

Sad to not see Jepsen tests ran on this at least by the developers when releasing a new clustering piece.

The folks working on cockroachdb recently did this[1] and it was a good read.

[1] https://www.cockroachlabs.com/blog/diy-jepsen-testing-cockro...

[+] cbHXBY1D|9 years ago|reply

Cloudant did it for their clustering implementation of CouchDB. See here: https://cloudant.com/blog/run-dmc-explains-network-partition...

[+] robinson_k|9 years ago|reply

CouchDB is eventually consistent and does not guarantee consistency. Would a Jepsen test make sense?

[+] janl|9 years ago|reply

We’d definitely love a Jepsen write up, preferably by @aphyr himself. We just need to find out how to get that going :)

[+] marknadal|9 years ago|reply

Congrats to the CouchDB team! I remember playing around with writing a NodeJS driver for it all the way back in 2011. Sadly, I decided to use and make a driver for MongoDB instead simply because it was easier at the time. Despite that choice, I have always admired CouchDB for pushing the frontiers on stuff like Offline-First with PouchDB way ahead of its time (and also what inspired me a little bit for our Open Source Firebase alternative http://gun.js.org/ ).

I am particularly really excited to see the announcement of the Mango query language, since querying Couch was one of lesser-easy things to do back then. I'm also very excited to hear about performance improvements, as this has been particularly interesting to me as I've been tracking various system's performance as I have worked on our own (Mongo with Wired Tiger, Cassandra, Redis, and even Chrome V8 engine as we have built towards 30M+ ops/sec, see https://github.com/amark/gun/wiki/100000-ops-sec-in-IE6-on-2... ). However clicking through on the performance links didn't lead to any numbers or benchmarks. I would love to see that!

Really happy that Couch is on the homepage of Hacker News. I really feel like they made lots of correct decisions that got passed over by the NoSQL craze, and have lately not been receiving the type of attention as it should compared to (what I biasly think) unfavorable but hyped up Master-Slave systems. People should really check into Couch's Master-Master replication!

[+] eriknstr|9 years ago|reply

Just installed it on FreeBSD 10.3.

Fetched and unpacked the source tarball, then did mostly what the INSTALL.Unix.md said to do;

  sudo pkg install erlang icu spidermonkey185 gmake gcc curl help2man py27-sphinx

  ./configure

  gmake release

  sudo pw useradd couchdb -u 5984 -c "CouchDB Administrator" -L daemon -s /usr/local/bin/bash

  sudo cp -R rel/couchdb /home/couchdb

  sudo chown -R couchdb:couchdb /home/couchdb

  sudo find /home/couchdb -type d -exec chmod 0770 {} \;

  sudo find /home/couchdb/etc/ -type f -exec chmod 0644 {} \;

Note how I set the uid to 5984 ;)

Unfortunately, when I try to actually run it...

  sudo -i -u couchdb ~couchdb/bin/couchdb

...I'm just getting messages like:

  [error] 2016-09-20T23:57:55.321596Z couchdb@localhost emulator -------- 
  Error in process <0.547.0> on node couchdb@localhost with exit value:

  {database_does_not_exist,[{mem3_shards,load_shards_from_db,"_users",
  [{file,"src/mem3_shards.erl"},{line,327}]},{mem3_shards,load_shards_from_disk,1,
  [{file,"src/mem3_shards.erl"},{line,315}]},{mem3_shards,load_shards_from_disk,2,
  [{file,"src/mem3_shards.erl"},{line,331}]},{mem3_shards,for_docid,3,
  [{file,"src/mem3_shards.erl"},{line,87}]},{fabric_doc_open,go,3,
  [{file,"src/fabric_doc_open.erl"},{line,38}]},
  {chttpd_auth_cache,ensure_auth_ddoc_exists,2,
  [{file,"src/chttpd_auth_cache.erl"},{line,187}]},
  {chttpd_auth_cache,listen_for_changes,1,
  [{file,"src/chttpd_auth_cache.erl"},{line,134}]}]}

  [notice] 2016-09-20T23:57:55.321650Z couchdb@localhost <0.323.0> -------- 
  chttpd_auth_cache changes listener died database_does_not_exist at 
  mem3_shards:load_shards_from_db/6(line:327) <= mem3_shards:load_shards_from_disk/1(line:315) <= 
  mem3_shards:load_shards_from_disk/2(line:331) <= mem3_shards:for_docid/3(line:87) <= 
  fabric_doc_open:go/3(line:38) <= chttpd_auth_cache:ensure_auth_ddoc_exists/2(line:187) <= 
  chttpd_auth_cache:listen_for_changes/1(line:134)

[+] chrisfosterelli|9 years ago|reply

The new clustered instance doesn't autocreate the system tables. It used to do this in 1.x, but no longer does.

You'll have to create these tables:

* _global_changes

* _metadata

* _replicator

* _users

It's a bit of a pain!

[+] nowprovision|9 years ago|reply

A fews things I wanted out of CouchDB years back: - faster bulk indexing - space reduction, I think a simple couchdb to psql json was 1/10th of size - ES6 or even ES5 - * its been a few years since I last looked but I remember you had to tread carefully - Object.keys maybe was one?

When I started with CouchDB it wrong choice for so many reasons, client had <30gb of data, couchdb was cooler than node.js, and I was frustrated with SQL Server. In hindsight sticking with SQL Server or Postgresql would of been better - older/wiser today.

[+] janl|9 years ago|reply

> - faster bulk indexing

With clustering you now get that. By way of “oversharding” even on a single node. Speed up is linear with number of shards / CPUs

> - space reduction,

2.0 has the better compaction format. There are still ways to improve, but we are getting there.

> - ES6 or even ES5

Our custom wrapper around Spidermonkey 185 is getting long in the tooth. It didn’t make it into 2.0, but we are well aware that this needs updating.

Anyway, sounds like we got there in the end ;)

[+] reitoei|9 years ago|reply

Classic NoSQL adopter bantz.

[+] drhayes9|9 years ago|reply

I love Couch and I feel totally relaxed. What are people building with it?

A spiffy log package was the first thing I thought of and indeed that's one idea listed on this page: http://docs.couchdb.org/en/2.0.0/intro/why.html

But I think the "logging" case is hard to argue vs. "manually" grepping (or ag-ing using the silver searcher) over in-place log files and aggregating/rendering dashboards via static files.

[+] teddyc|9 years ago|reply

I used it to store user-uploaded images in a photo contest website years ago. It was a happy medium between storing images directly on the filesystem or as a blob/binary in a relational database. The image requests were AJAX, so I could include the clients screen dimension in the request. My app would resize on the fly if necessary, store resized image to CouchDB, and redirect the image request to CouchDB, which was publicly readable.

Another good use case I had was for JSONP request for an auto-complete input field on a webpage. Again, the database was publicly readable.

I have also used it to aggregate data for graphs. The data changed daily, so the ability to cache the results of a view until the data changed was nice. But the first request of data each day still took a while. I don't think I got a performance boost in this case, but I did get free caching.

All my other uses of CouchDB were mostly for fun and could've been implemented in traditional SQL.

[+] WorldMaker|9 years ago|reply

Not using Couch directly at the moment, but have several projects I'm working on that are using Pouch as an easy to sync, offline-first data store. (These are largely targeting Cordova/Electron.)

[+] eriknstr|9 years ago|reply

I bought the first edition of an O'Reilly book called CouchDB: The Definitive Guide a few years back.

Looking at the online draft version of the book, the tour chapter [0] still has version 0.10.1 in the example.

I wonder if there will be a second edition of the book covering mango, clustering and the new admin interface.

[0]: http://guide.couchdb.org/draft/tour.html

[+] pokstad|9 years ago|reply

That was a great book that got me into CouchDB. I think the original authors of that book, primarily Damien Katz and J. Chris Anderson, left for Couchbase. Today's CouchDB has different motives than the one that book was written for. Couch Apps are no longer the killer feature.

[+] fiatjaf|9 years ago|reply

You don't need a book for learning an admin interface.

[+] qwertyuiop924|9 years ago|reply

I'm actually really excited for this. There seem to be a lot of new, interesting, and useful features, and I'm already a huge fan of the couch model.

[+] bdcravens|9 years ago|reply

> The second major feature is the declarative query language “Mango”.

Nice name :-)

[+] Direct|9 years ago|reply

I wonder if the design is based on some of the solid research that went into MangoDB[0].

[0]: https://github.com/dcramer/mangodb

[+] janl|9 years ago|reply

It’s MongoDB inspired obviously, but at the time MongoDB asked IBM/Cloudant to not call it that. They settled on Mango. The Cloudant product then became “Cloudant Query”, so CouchDB easily inherited the Mango nickname. We like it :)

[+] SwellJoe|9 years ago|reply

Mango is also the name of a popular MongoDB driver for Perl, most commonly used with Mojolicious. I'm not sure which came first.

http://search.cpan.org/~odc/Mango-1.29/lib/Mango.pm

[+] hiphipjorge|9 years ago|reply

The UI looks pretty awesome. Kind of see the RethinkDB influence in shipping with a web admin, and it's great that where things are heading.

[+] chromakode|9 years ago|reply

Couch did it first!

[+] m_mueller|9 years ago|reply

So, as a CouchDB 1.x user, where to go next? CouchDB 2.x or Couchbase?

[+] rdtsc|9 years ago|reply

CouchDB 2.0 is a direct descendant of CouchDB 1.x. The API is 99% compatible, same community working on it, you can replicate between them and so on.

I haven't used Couchbase, but I understand besides the "Couch" prefix and that CouchDB's original author working there for a few years, it doesn't have much in common with CouchDB project.

[+] janl|9 years ago|reply

Couchbase has a different API, so it is not an “upgrade”, more a rewrite.

CouchDB 2 is 99% API compatible with version 1.

[+] mehh|9 years ago|reply

So its taken about 3 years to get from the alpha to release .. meh!

Whilst I really like couch I'm just not convinced on using it in production due to its glacial pace.

[+] maxpert|9 years ago|reply

Good to see v2.0 took them quite long but hey better late then never.

[+] Kristine1975|9 years ago|reply

The first thing I think upon hearing CouchDB is "perform like a pr0n star"...

[+] desireco42|9 years ago|reply

I remember CouchDB when it was initially released, it was one of the most promising and innovative DB's around. It's a shame it didn't live up to the hype.

[+] patwolf|9 years ago|reply

I wouldn't say that it didn't live up to the hype. I think it just has a steeper learning curve than a lot of other NoSQL databases and is often overlooked.

MongoDB makes a lot of sense to folks already familiar with relational databases. Collections are like tables, documents are like rows, and it's easy to use document IDs to establish relations and perform queries like you would in a relational database. There are a lot of problems with using it like a RDBMS, but still a developer with zero knowledge of MongoDB can become productive with it very quickly.

When learning CouchDB, the first WTF moment is when you realize there are no tables and that all the documents regardless of the type of data they contain go in the same place. Eventually you learn that you can add a type field to the documents to distinguish between them, which does feel hackish. The next WTF moment is when you want to query the data and realize you have to use map-reduce to do what would seem trivial in any other database. The early version of the admin tool definitely didn't make this easy since it required writing a JavaScript function and escaping it so that it could be stored as a single-line JSON string. It also didn't help that the map-reduce code was stored in special documents that used magical document IDs to distinguish them from other documents, which again feels hackish but makes sense eventually.

That said, I am a big fan of CouchDB, and I hope that with the query language and new UI in 2.0 that CouchDB will earn the respect it deserves.

[+] qwertyuiop924|9 years ago|reply

I actually quite like Couch. It's an interesting system, and I think it doesn't get enough credit.

[+] fet|9 years ago|reply

Hype or not, I inherited a project that had CouchDB baked into the bones. While I found its disk space usage and its time to reindex large databases frustrating, I did appreciate REST-like queries, how it just saved whatever you wanted and how the map/reduce queries could actually be quite powerful for ETL and reporting (though it took a good while to index every time we had a change). We abandoned replication early on as just too chatty over high latency networks.

I love the CouchApp idea because it's a natural extension of "save whatever you want." As long as we're saving json, why not save static files too? It lets you prototype quickly and, in our case, let us develop and test out specific features for clients by quarantining risky crap code on the fly into one database. I like to think we're beyond that stage, but at the time it was invaluable to keep multiple versions of our app running concurrently.

I have 3 solid years of experience working with CouchDB and in the end I find myself longing for PostgreSQL with one bson column for the unknown or "volatile" attributes. Our data was semi-relational, as I would argue is most data. Meaning there were honest to goodness has-many or belongs-to type queries that could have been simplified and easier to maintain outside the application code.

I know that's just a style, some people would use key-value stores for everything with no validation on anything. If you're not that far gone and you still like freedom then CouchDB might be for you. As for me, I wanted an error from my database if my application code asked for something or tried to store something invalid. But for rapid prototyping for us I don't think we could have gotten a better database.

[+] rdtsc|9 years ago|reply

> , it was one of the most promising and innovative DB's around. It's a shame it didn't live up to the hype.

It promised master-to-master replication, so has that. Doesn't lose your data (actually fsyncs, yay!). Has a helpful integrated web interface, is HTTP + JSON interface.

I don't know I've shipped lots of products on it. I say it lived up pretty well to the hype...

84 comments