CouchDB 3.0 | WingNews

CouchDB is awesome, full stop.

While it's missing some popularity from MongoDB and having wide adoption of things like mongoose in lots of open source CMS-type projects, it wins for the (i believe) unique take on map / reduce and writing custom javascript view functions that run on every document, letting you really customize the way you can query slice and access parts of your data...

Example: I'm building a document analysis app that does topic + keyword frequency vectorization of a corpus of documents, only a few thousand for now.

I end up with a bunch of documents that have "text": "here is my document text..." and "vector": [ array of floating point values ...].

What I can do with couchdb is store that 20d vector and emit integers of it as a query key:

    var intVectors = doc.vector.map(function(val){
      return Math.floor(val)
    })
    emit(intVectors, 1);

Then I can match an input document's vector (calculated the same as corpus documents), calculate a 'range' of those vectors, pass it as start and end keys, and super quickly get a result from the database of 'here are documents that have vectors similar to your input'...

Super fun, quick and flexible to work with!

mikekchar|6 years ago

> CouchDB is awesome, full stop.

I really like CouchDB. It is wonderful if you want that kind of DB. However, if you want a relational DB (and there are many, many, many reasons to want one), do not pick CouchDB. It works very poorly as a relational DB.

I have a legacy project that didn't quite understand this point and we have ended up paying the price for a document oriented DB in which it is hard to migrate and where we are constantly having to worry about the bandwidth to the view server. And then all the amazing, wonderful features of couch? We don't use a single one :-P Fail all the way around. However, I still like it and instead of retiring it, I've been slowly trying to start using the features that make it awesome, while mitigating some of the problems that have piled up over the years.

treis|6 years ago

>CouchDB is awesome, full stop.

The problem I had with CouchDB is integrating it into a framework like Rails. CouchDB on its own does so much cool stuff. The "free" HTTP API and client replication via PouchDB are the two huge ones. But it just wasn't smooth enough to get the data out, use it where I wanted, and then save it back.

xwowsersx|6 years ago

[deleted]

newfeatureok|6 years ago

One interesting thing you can do with CouchDB is that you can have a webapp where a user can specify their own database and credentials and it works over HTTP(s). That's pretty unique. I'd love to see a SaaS using CouchDB and their "on-premise" offering just means the user provides their own database. I'm not sure how payment would work though - perhaps some verification proxy?

Firebase is the gold-standard for offline apps (as a service). CouchDB replaces Cloud Firestore, and Keycloak replaces Authentication. I haven't seen OSS equivalents of Cloud Functions, ML Kit, and the other things (e.g. In-App messaging, and Cloud Messaging). It'd be nice to have the entire stack of Firebase bundled as a group of OSS projects, including CouchDB.

Sad to see that per doc access control didn't make it in 3.0. Hopefully it'll be in 3.1.

Graphguy|6 years ago

Cloudant on IBM Cloud is CouchDB API/replication compatible and offers support for Apache CouchDB (1). Also, OpenWhisk integrates nicely with CouchDB/Cloudant and can even be a backing persistence for it (2)

(1) https://www.ibm.com/cloud/blog/announcements/announcing-supp... (2)https://github.com/apache/openwhisk/blob/master/tools/db/REA...

matlin|6 years ago

I'm really can't wait for the per-doc permissions because I'm building something very similar to what you're describing and with CouchDB!focusing on the database and auth side first and then adding functions.

So shameless plug if you're interested in signing up for the alpha: https://www.aspen.cloud

WorldMaker|6 years ago

Yeah, I'm still disappointed that the MongoDB API outpaced the CouchDB Replication Protocol in general adoption. As nice as Cloudant can be some of the time, I know that my IT group would be a lot happier if we could use Cosmos DB (and/or if Cloudant would just directly support Azure data centers again).

Every now and again I wonder if I could implement the CouchDB Replication Protocol on top of Cosmos DB with a presumably hairy ball of Azure Functions and hoping someone beats me to needing that to exist and scratches that itch for me. (Cosmos DB's changes feed is so almost right for the job it hurts because it sounds like it should be easy, and yet I assume it won't be.)

unknown|6 years ago

[deleted]

moenzuel|6 years ago

For a Cloud Functions like project, OpenFaas seems like a promising project that I’ve been watching but have not yet had the chance to use.

pachico|6 years ago

I didn't understand. You mean it's unique to work over http(s)?

kache_|6 years ago

Cloudant off IBM cloud. Full disclosure; I utilized it to support the application layer on IBM cloud.

chasers|6 years ago

I'm doing this with BigQuery for Logflare (logflare.app).

smoyer|6 years ago

I built two products on CouchDB 1.x starting in 2010 ... version three is another amazing step forward! For my more recent projects, I've replaced CouchDB with clustered PostgreSQL using JSON columns as I really enjoy the ability to write SQL queries for against the JSON and to use the built-in full-text search capabilities. I think both CouchDB and clustered PostgreSQL are amazing tools and it's nice to be able to choose between them as needed. The best advice I've heard is to choose CouchDB when you know your queries ahead of time and the data "schema"[1] is variable and choose PostgreSQL when you know your data ahead of time and your queries are variable.

[1] In this case, a JSON document but either with a JSON-schema or marshaled/unmarshaled into a strict type.

jimstr|6 years ago

I've gotten the impression that clustered Postgres still isn't very straightforward to run. Do you mind elaborating on your ideal setup and point to some resources?

Thanks!

karmelapple|6 years ago

JSON Schema has been a big benefit for our use case. Our iOS, Android, and web app all pull in a schema from one repo, which serves up that schema via Cocoapods, Gradle, or npm. We built it years ago and it’s worked smoothly ever since.

knubie|6 years ago

CouchDB is awesome and feels way ahead of its time. Its design docs are extremely powerful, to the point that you can build entire web apps with CouchDB alone (not that that's recommended anymore). Plus with PouchDB you can create offline-first apps that sync with a remote CouchDB instance.

code-is-code|6 years ago

If you like PouchDB, you should also check out RxDB. It is build on top of PouchDB and is optimised for realtime-applications where you can subscribe to queries and stuff.

https://github.com/pubkey/rxdb

jfkebwjsbx|6 years ago

Ahead of its time?

PL/SQL also allowed (and allows) you to create entire apps within a database.

Phillips126|6 years ago

I haven't heard of CouchDB in quite some time, great to see it still improving.

I used it years ago when I was experimenting with Ionic[0]. What appealed to me was that I could use CouchDB (cloud) and PouchDB[1] (device) to and have a replicated copy of the data locally. The application was used in areas where network connection was very limited. Using this strategy I was able to ensure the mobile devices data was as recent as the last time it had a network connection.

[0] - https://ionicframework.com/

[1] - https://pouchdb.com/

lytefm|6 years ago

I can confirm that the stack still works well :) We've been developing a cross-platform app for the German market - therefore the need of offline capability - since 2017 and never had any real issues with Pouch/Couch, that part just worked. The upgrade from Ionic 3 to 4 was was quite painful though.

For user authentication I've forked the nowadays unmaintained superlogin package [1], which still does a great job when keeping the dependencies up to date.

[1] https://github.com/LyteFM/superlogin

hajile|6 years ago

Reducing max document size from 4GB down to 8MB seems hyper-restrictive.

For those interested, looks like the guts of CouchDB are going to be swapped out for FoundationDB.

https://blog.couchdb.org/2020/02/26/the-road-to-couchdb-3-0-...

newfeatureok|6 years ago

8MB is just the default, you can switch it back to 4GB if you want, but you won't have an easy time switching to 4.0 due to the 8MB limit imposed by FoundationDB.

splatcollision|6 years ago

If you're trying to store single GB documents in couch, you're doing it wrong... Unless those are binaries you can usually fragment data logically across many documents, then write custom views to aggregate however you need to.

Updates on huge docs would be painful!

yyyk|6 years ago

Couch/Pouch combination is really slow when document +attachment are too large. Based on my experience, Practical limit was more like 20-30Mb, and I'd suggest not even getting close to that. 8Mb simply recognizes reality.

lytefm|6 years ago

At least since 2.0 (haven't used Couch before), the docs have always recommended to only use small documents in a CouchDB and to use an external storage for large files.

The IBM Cloudant free tier only allows Doxs up to 1 MB.

So this doesn't really come as a surprise or feel hyper restrive to me.

AtlasBarfed|6 years ago

That still exists? I thought apple bought them and shuttered them.

johnchristopher|6 years ago

> – Updated to modern JavaScript engine SpiderMonkey 60

Yes ^^ !

Congrats to the team. These people are some of the nicest and most supportive devs I know of in the OSS community (or whatev').

They show a great deal of patience in their slack channel and are always welcoming and answering stupid questions from idiots like me.

janl|6 years ago

<3

tbrock|6 years ago

At this point why would you use CouchDB over something like MongoDB?

Seriously asking...

Over the past 5 years MongoDB has gotten a great storage engine, transactions, distributed transactions, multi master replication, first class change streams and is very very solid as a foundational piece of infrastructure you can rely on while CouchDB has languished. I can’t imagine reaching for it in my tool belt when I need a document store over MongoDB but I’m obviously biased so I’m wondering if there is a lot I’m missing.

Obviously it’s cool from a more open source databases standpoint — I love learning about how things are built and evolve over time.

pritambaral|6 years ago

1. MongoDB is no longer Open Source.

2. MongoDB's design has historically been terrible; and, from my current experience with clients, is still a source of 'WTF's.

newfeatureok|6 years ago

The main reason most people use CouchDB is because of the HTTP API and offline support with Couchbase Mobile and PouchDB. Doesn't CouchDB have most of those things already from 2.3?

almery|6 years ago

CouchDB is licensed under Apache License 2.0, while MongoDB uses SSPL, which was rejected by the OSI https://www.zdnet.com/article/mongodb-open-source-server-sid...

Quarrelsome|6 years ago

MongoDB has _suspiciously_ amazing SEO and marketing. CouchDB's by contrast is awful.

As silly as that sounds as a reason to choose CouchDB it demonstrates where the respective company's priorities lie.

liamdiprose|6 years ago

The link explains that CouchDB can have replicas on mobile phones and websites, meaning clients don't always have to be connected to the internet.

> The Couch Replication Protocol lets your data flow seamlessly between server clusters to mobile phones and web browsers, enabling a compelling offline-first user-experience

speedgoose|6 years ago

Does MongoDB have multi master replication or the classic election of one master from a pool of candidates ?

CouchDB has another pattern, each master is really a master and you can have live replication but also offline replication. You can connect two clusters every new moon and they will synchronize. For sure the clients may have to deal with potential conflicts but in practice it's very neat and that's what makes couchdb worth it if you need this feature.

freeqaz|6 years ago

They are tools that solve different problems, imo. In CAP theorem[0] you have 3 groups of DBs.

CA databases: SQL databases that are hard to scale ("partition") but are always consistent and available.

CP databases: MongoDB style databases that are consistent and partition tolerant, but trade availability (sometimes your queries will fail during high load).

AP databases: CouchDB style databases. They are always available and are partition tolerant, but you may be querying stale data.

[0]: https://en.wikipedia.org/wiki/CAP_theorem?wprov=sfla1

janl|6 years ago

> At this point why would you use CouchDB over something like MongoDB?

They are very different databases. But since they have come up are around the same time and because they look very similar on the surface, you might think you chose between them.

But when you look more closely at detail decisions on the technical details, at almost every point, where CouchDB goes one way, Mongo went the other way.

I’m not saying either decisions are better or worse, it’s just that they are very different database that you should evaluate on their merits, not just superficially.

anonyfox|6 years ago

Has anyone here tried to use couchdb directly within an elixir/erlang OTP application? As like, „mix install“? Would kill for couchdb as a library!

pawelk|6 years ago

Other than being a great solution for some problems I wanted to highlight the fact that CouchDB has commited to SpiderMonkey (the Mozilla JS engine) since the very beginning and is one of the few projects helping to fend away the V8 monoculture.

crudbug|6 years ago

Congrats to the whole team.

Looking forward to CouchDB 4.0/FoundationDB goodies. Do we have any roadmap details on this.

e12e|6 years ago

Oh wow, this is great news. I though the project was effectively long dead. Is there a new/up-to-date "couchapp" too?

> Default installations are now secure and locked down.

More good news!

Anyone have recent experience with couchdb?

I see the (quickstart) docs use plain http - should one terminate ssl in front, eg with a recent version of haproxy?

mauflows|6 years ago

I wish couch was used whenever users ask for an app to "sync to Dropbox". I don't know if this changes with 3.0 but couch is naturally database per user, took me five minutes to install on my rpi with docker, very good admin interface, the database is the frontend (no driver or separate process), and let's the application layer handle conflicts.

janl|6 years ago

We use https://github.com/jo/couchdb-bootstrap successfully.

CouchDB does SSL natively, but we do recommend HAProxy.

Graphguy|6 years ago

For anyone else looking to quickstart but on Kube, https://operatorhub.io/operator/couchdb-operator. Should add 3.0 soon.

couchdb_ouchdb|6 years ago

I'm surprised to see so much love for CouchDB in this thread. I don't think it's been widely adopted in corporate america and has lost the war to MongoDB closed source or not.

I joined a company where it's being used backing a mobile app with couch/pouch in production. We can't wait to get off of it. Writes are slow. Reads are worse. Having a DB per user is a scaling and backup nightmare. If you run into any issues, it's a ghost town.

I'm glad the CouchDB Team is forging ahead, but who is really using this database?

janl|6 years ago

I sadly can’t name names, but rest assured the fortune 500 is heavily involved.

OTOH, publicly known big companies using CouchDB include Apple and IBM.

And I worked on a team that used CouchDB’s offline capability in the 2015 Ebola crisis in West Africa. That work also lead to the first Ebola vaccine ever.

That’s why we do CouchDB :)

staticautomatic|6 years ago

Would you be willing to say more? Inquiring minds want to know.

yatsyk|6 years ago

CouchDB/PouchDB looks very promising for offline first apps, but I can’t understand how to restrict bad clients. Client potentially could insert document of huge size or execute expensive query and degrade experience of other clients on the same server. Is it any way to prevent this?

Volundr|6 years ago

A couple ways:

One you implement validation functions [1] on user databases to control what kind of data can be inserted into couch. These functions can only be changed by database admins, not users, so can act as a security mechanism controlling what goes in.

As mentioned by others you can also implement a proxy. This doesn't have to interfere with sync functionality, you just have to make sure you proxy all the endpoints in the replication protocol [2]. Envoy [3] is one such proxy that essentially applies document level permissions to a CouchDB database without interfering with sync.

If the goal is just to limit document size, or throttle clients trying to hammer the API, this doesn't even have to be a custom proxy, and reverse proxy with the needed control knobs (such as NGINX) will do. You can of course combine this with validation functions, using validations to ensure the everything that comes in is the right "shape" and using NGINX and it's ilk to apply throttling and sane request limits.

At scale there's a decent chance you want a proxy in front of your Couch instance anyway, since Couch is truly multi-master, meaning you probably want to balance your clients across all your nodes anyway.

[1] https://docs.couchdb.org/en/stable/ddocs/ddocs.html#validate... [2] https://docs.couchdb.org/en/stable/replication/protocol.html... [3] https://github.com/cloudant-labs/envoy

newfeatureok|6 years ago

You resolve that issue the same way you would resolve the same issue if you were using Postgres - you introduce some back-end.

For your example specifically I'd use a proxy.

fiatjaf|6 years ago

Many commenters here still think CouchDB is the same thing it was many years ago.

CouchDB was a simple but very powerful idea (that still needed improvements), but it was coopted into something not very nice nor good nor useful.

See my old rant about it and why it failed: http://web.archive.org/web/20170530122143/http://entulho.fia...

LoSboccacc|6 years ago

is the lucene search indexer synchronous with couchdb days updates?

I'm wondering how people solve the common search after create pattern when using external indexes

janl|6 years ago

Yup, works with clustering and everything: https://blog.couchdb.org/2020/02/26/the-road-to-couchdb-3-0-...

haolez|6 years ago

I once read that the right way to use CouchDB is for every user to have its own database. However, how does this work with BI? Or with public data that should be known by all users? Do I create a single centralized DB just for that kind of data? Maybe aggregate data from all users' DBs? Genuinely curious.

CameronNemo|6 years ago

For public data, you can try to partition it in such a way that writes can be merged without any potential conflicts. E.g. a user's posts are in a separate partition.

I have never done this with CouchDB, but the technique is described in Martin Kleppman's __Designing Data Intensive Applications__.

janl|6 years ago

You can replicate all per-user DBs into a central database today.

We are working on per-document-access-control at the moment, to support this use-case out of the box

gigatexal|6 years ago

Can't wait to see what CouchDB 4.0 with FoundationDB at it's core does for the db.

This is a great release too!

agumonkey|6 years ago

anybody acquainted with pouchdb devs ? just to know if there are plans to migrate already or not

gtirloni|6 years ago

https://github.com/pouchdb/pouchdb/issues/7987

Volundr|6 years ago

For those who don't want to follow the link, there are no changes to the replication protocol in Couch 3.0, so PouchDB already works.

mark_l_watson|6 years ago

I haven't used CouchDB in years. I just downloaded and installed it. Interesting that there are no apparent links to client libraries in different languages. Perhaps most people just use the HTTP API Reference and roll their own.

james_s_tayler|6 years ago

There are a couple of client libraries in .NET

Some are no longer maintained. Some still work.

mikekchar|6 years ago

In Ruby there is a CouchRest gem, which I've used, but to be honest a REST interface that talks JSON is so easy to use that I've often thought we'd be better off without anything specific.

seigel|6 years ago

CouchDB is good. Yes. I still dream of the day when the cluster will balance shards automatically and recover better from losing and replacing nodes. :D

canada_dry|6 years ago

Maybe I'm being petty, but it doesn't fill me with confidence when the ssl certificate on their website isn't even configured properly (valid for uberspace.de domain).

To clarify: seems their main site is on apache.org. But, their www.couchdb.org site (hosted on uberspace.de) doesn't have a correct cert.

lars_francke|6 years ago

For me I get a valid Let's Encrypt certificate that has blog.couchdb.org in its SAN list.

Hedja|6 years ago

Their blog is hosted on Wordpress.com which seems to be using Let's Encrypt to generate one certificate for multiple different, unrelated custom domain names.

Maybe you encountered a bug where it served the wrong cert for a different batch of custom domains.

unknown|6 years ago

[deleted]

canada_dry|6 years ago

Update: someone has now fixed it.

wildchild|6 years ago

[deleted]

154 comments