While it's missing some popularity from MongoDB and having wide adoption of things like mongoose in lots of open source CMS-type projects, it wins for the (i believe) unique take on map / reduce and writing custom javascript view functions that run on every document, letting you really customize the way you can query slice and access parts of your data...
Example: I'm building a document analysis app that does topic + keyword frequency vectorization of a corpus of documents, only a few thousand for now.
I end up with a bunch of documents that have "text": "here is my document text..." and "vector": [ array of floating point values ...].
What I can do with couchdb is store that 20d vector and emit integers of it as a query key:
var intVectors = doc.vector.map(function(val){
return Math.floor(val)
})
emit(intVectors, 1);
Then I can match an input document's vector (calculated the same as corpus documents), calculate a 'range' of those vectors, pass it as start and end keys, and super quickly get a result from the database of 'here are documents that have vectors similar to your input'...
I really like CouchDB. It is wonderful if you want that kind of DB. However, if you want a relational DB (and there are many, many, many reasons to want one), do not pick CouchDB. It works very poorly as a relational DB.
I have a legacy project that didn't quite understand this point and we have ended up paying the price for a document oriented DB in which it is hard to migrate and where we are constantly having to worry about the bandwidth to the view server. And then all the amazing, wonderful features of couch? We don't use a single one :-P Fail all the way around. However, I still like it and instead of retiring it, I've been slowly trying to start using the features that make it awesome, while mitigating some of the problems that have piled up over the years.
The problem I had with CouchDB is integrating it into a framework like Rails. CouchDB on its own does so much cool stuff. The "free" HTTP API and client replication via PouchDB are the two huge ones. But it just wasn't smooth enough to get the data out, use it where I wanted, and then save it back.
One interesting thing you can do with CouchDB is that you can have a webapp where a user can specify their own database and credentials and it works over HTTP(s). That's pretty unique. I'd love to see a SaaS using CouchDB and their "on-premise" offering just means the user provides their own database. I'm not sure how payment would work though - perhaps some verification proxy?
Firebase is the gold-standard for offline apps (as a service). CouchDB replaces Cloud Firestore, and Keycloak replaces Authentication. I haven't seen OSS equivalents of Cloud Functions, ML Kit, and the other things (e.g. In-App messaging, and Cloud Messaging). It'd be nice to have the entire stack of Firebase bundled as a group of OSS projects, including CouchDB.
Sad to see that per doc access control didn't make it in 3.0. Hopefully it'll be in 3.1.
Cloudant on IBM Cloud is CouchDB API/replication compatible and offers support for Apache CouchDB (1). Also, OpenWhisk integrates nicely with CouchDB/Cloudant and can even be a backing persistence for it (2)
I'm really can't wait for the per-doc permissions because I'm building something very similar to what you're describing and with CouchDB!focusing on the database and auth side first and then adding functions.
Yeah, I'm still disappointed that the MongoDB API outpaced the CouchDB Replication Protocol in general adoption. As nice as Cloudant can be some of the time, I know that my IT group would be a lot happier if we could use Cosmos DB (and/or if Cloudant would just directly support Azure data centers again).
Every now and again I wonder if I could implement the CouchDB Replication Protocol on top of Cosmos DB with a presumably hairy ball of Azure Functions and hoping someone beats me to needing that to exist and scratches that itch for me. (Cosmos DB's changes feed is so almost right for the job it hurts because it sounds like it should be easy, and yet I assume it won't be.)
I built two products on CouchDB 1.x starting in 2010 ... version three is another amazing step forward! For my more recent projects, I've replaced CouchDB with clustered PostgreSQL using JSON columns as I really enjoy the ability to write SQL queries for against the JSON and to use the built-in full-text search capabilities. I think both CouchDB and clustered PostgreSQL are amazing tools and it's nice to be able to choose between them as needed. The best advice I've heard is to choose CouchDB when you know your queries ahead of time and the data "schema"[1] is variable and choose PostgreSQL when you know your data ahead of time and your queries are variable.
[1] In this case, a JSON document but either with a JSON-schema or marshaled/unmarshaled into a strict type.
I've gotten the impression that clustered Postgres still isn't very straightforward to run. Do you mind elaborating on your ideal setup and point to some resources?
JSON Schema has been a big benefit for our use case. Our iOS, Android, and web app all pull in a schema from one repo, which serves up that schema via Cocoapods, Gradle, or npm. We built it years ago and it’s worked smoothly ever since.
CouchDB is awesome and feels way ahead of its time. Its design docs are extremely powerful, to the point that you can build entire web apps with CouchDB alone (not that that's recommended anymore). Plus with PouchDB you can create offline-first apps that sync with a remote CouchDB instance.
If you like PouchDB, you should also check out RxDB. It is build on top of PouchDB and is optimised for realtime-applications where you can subscribe to queries and stuff.
I haven't heard of CouchDB in quite some time, great to see it still improving.
I used it years ago when I was experimenting with Ionic[0]. What appealed to me was that I could use CouchDB (cloud) and PouchDB[1] (device) to and have a replicated copy of the data locally. The application was used in areas where network connection was very limited. Using this strategy I was able to ensure the mobile devices data was as recent as the last time it had a network connection.
I can confirm that the stack still works well :) We've been developing a cross-platform app for the German market - therefore the need of offline capability - since 2017 and never had any real issues with Pouch/Couch, that part just worked. The upgrade from Ionic 3 to 4 was was quite painful though.
For user authentication I've forked the nowadays unmaintained superlogin package [1], which still does a great job when keeping the dependencies up to date.
8MB is just the default, you can switch it back to 4GB if you want, but you won't have an easy time switching to 4.0 due to the 8MB limit imposed by FoundationDB.
If you're trying to store single GB documents in couch, you're doing it wrong... Unless those are binaries you can usually fragment data logically across many documents, then write custom views to aggregate however you need to.
Couch/Pouch combination is really slow when document +attachment are too large. Based on my experience, Practical limit was more like 20-30Mb, and I'd suggest not even getting close to that. 8Mb simply recognizes reality.
At least since 2.0 (haven't used Couch before), the docs have always recommended to only use small documents in a CouchDB and to use an external storage for large files.
The IBM Cloudant free tier only allows Doxs up to 1 MB.
So this doesn't really come as a surprise or feel hyper restrive to me.
At this point why would you use CouchDB over something like MongoDB?
Seriously asking...
Over the past 5 years MongoDB has gotten a great storage engine, transactions, distributed transactions, multi master replication, first class change streams and is very very solid as a foundational piece of infrastructure you can rely on while CouchDB has languished. I can’t imagine reaching for it in my tool belt when I need a document store over MongoDB but I’m obviously biased so I’m wondering if there is a lot I’m missing.
Obviously it’s cool from a more open source databases standpoint — I love learning about how things are built and evolve over time.
The main reason most people use CouchDB is because of the HTTP API and offline support with Couchbase Mobile and PouchDB. Doesn't CouchDB have most of those things already from 2.3?
The link explains that CouchDB can have replicas on mobile phones and websites, meaning clients don't always have to be connected to the internet.
> The Couch Replication Protocol lets your data flow seamlessly between server clusters to mobile phones and web browsers, enabling a compelling offline-first user-experience
Does MongoDB have multi master replication or the classic election of one master from a pool of candidates ?
CouchDB has another pattern, each master is really a master and you can have live replication but also offline replication. You can connect two clusters every new moon and they will synchronize. For sure the clients may have to deal with potential conflicts but in practice it's very neat and that's what makes couchdb worth it if you need this feature.
They are tools that solve different problems, imo. In CAP theorem[0] you have 3 groups of DBs.
CA databases: SQL databases that are hard to scale ("partition") but are always consistent and available.
CP databases: MongoDB style databases that are consistent and partition tolerant, but trade availability (sometimes your queries will fail during high load).
AP databases: CouchDB style databases. They are always available and are partition tolerant, but you may be querying stale data.
> At this point why would you use CouchDB over something like MongoDB?
They are very different databases. But since they have come up are around the same time and because they look very similar on the surface, you might think you chose between them.
But when you look more closely at detail decisions on the technical details, at almost every point, where CouchDB goes one way, Mongo went the other way.
I’m not saying either decisions are better or worse, it’s just that they are very different database that you should evaluate on their merits, not just superficially.
Other than being a great solution for some problems I wanted to highlight the fact that CouchDB has commited to SpiderMonkey (the Mozilla JS engine) since the very beginning and is one of the few projects helping to fend away the V8 monoculture.
I wish couch was used whenever users ask for an app to "sync to Dropbox". I don't know if this changes with 3.0 but couch is naturally database per user, took me five minutes to install on my rpi with docker, very good admin interface, the database is the frontend (no driver or separate process), and let's the application layer handle conflicts.
I'm surprised to see so much love for CouchDB in this thread. I don't think it's been widely adopted in corporate america and has lost the war to MongoDB closed source or not.
I joined a company where it's being used backing a mobile app with couch/pouch in production. We can't wait to get off of it. Writes are slow. Reads are worse. Having a DB per user is a scaling and backup nightmare. If you run into any issues, it's a ghost town.
I'm glad the CouchDB Team is forging ahead, but who is really using this database?
I sadly can’t name names, but rest assured the fortune 500 is heavily involved.
OTOH, publicly known big companies using CouchDB include Apple and IBM.
And I worked on a team that used CouchDB’s offline capability in the 2015 Ebola crisis in West Africa. That work also lead to the first Ebola vaccine ever.
CouchDB/PouchDB looks very promising for offline first apps, but I can’t understand how to restrict bad clients. Client potentially could insert document of huge size or execute expensive query and degrade experience of other clients on the same server. Is it any way to prevent this?
One you implement validation functions [1] on user databases to control what kind of data can be inserted into couch. These functions can only be changed by database admins, not users, so can act as a security mechanism controlling what goes in.
As mentioned by others you can also implement a proxy. This doesn't have to interfere with sync functionality, you just have to make sure you proxy all the endpoints in the replication protocol [2]. Envoy [3] is one such proxy that essentially applies document level permissions to a CouchDB database without interfering with sync.
If the goal is just to limit document size, or throttle clients trying to hammer the API, this doesn't even have to be a custom proxy, and reverse proxy with the needed control knobs (such as NGINX) will do. You can of course combine this with validation functions, using validations to ensure the everything that comes in is the right "shape" and using NGINX and it's ilk to apply throttling and sane request limits.
At scale there's a decent chance you want a proxy in front of your Couch instance anyway, since Couch is truly multi-master, meaning you probably want to balance your clients across all your nodes anyway.
I once read that the right way to use CouchDB is for every user to have its own database. However, how does this work with BI? Or with public data that should be known by all users? Do I create a single centralized DB just for that kind of data? Maybe aggregate data from all users' DBs? Genuinely curious.
For public data, you can try to partition it in such a way that writes can be merged without any potential conflicts. E.g. a user's posts are in a separate partition.
I have never done this with CouchDB, but the technique is described in Martin Kleppman's __Designing Data Intensive Applications__.
I haven't used CouchDB in years. I just downloaded and installed it. Interesting that there are no apparent links to client libraries in different languages. Perhaps most people just use the HTTP API Reference and roll their own.
In Ruby there is a CouchRest gem, which I've used, but to be honest a REST interface that talks JSON is so easy to use that I've often thought we'd be better off without anything specific.
CouchDB is good. Yes. I still dream of the day when the cluster will balance shards automatically and recover better from losing and replacing nodes. :D
Maybe I'm being petty, but it doesn't fill me with confidence when the ssl certificate on their website isn't even configured properly (valid for uberspace.de domain).
To clarify: seems their main site is on apache.org. But, their www.couchdb.org site (hosted on uberspace.de) doesn't have a correct cert.
Their blog is hosted on Wordpress.com which seems to be using Let's Encrypt to generate one certificate for multiple different, unrelated custom domain names.
Maybe you encountered a bug where it served the wrong cert for a different batch of custom domains.
splatcollision|6 years ago
While it's missing some popularity from MongoDB and having wide adoption of things like mongoose in lots of open source CMS-type projects, it wins for the (i believe) unique take on map / reduce and writing custom javascript view functions that run on every document, letting you really customize the way you can query slice and access parts of your data...
Example: I'm building a document analysis app that does topic + keyword frequency vectorization of a corpus of documents, only a few thousand for now.
I end up with a bunch of documents that have "text": "here is my document text..." and "vector": [ array of floating point values ...].
What I can do with couchdb is store that 20d vector and emit integers of it as a query key:
Then I can match an input document's vector (calculated the same as corpus documents), calculate a 'range' of those vectors, pass it as start and end keys, and super quickly get a result from the database of 'here are documents that have vectors similar to your input'...Super fun, quick and flexible to work with!
mikekchar|6 years ago
I really like CouchDB. It is wonderful if you want that kind of DB. However, if you want a relational DB (and there are many, many, many reasons to want one), do not pick CouchDB. It works very poorly as a relational DB.
I have a legacy project that didn't quite understand this point and we have ended up paying the price for a document oriented DB in which it is hard to migrate and where we are constantly having to worry about the bandwidth to the view server. And then all the amazing, wonderful features of couch? We don't use a single one :-P Fail all the way around. However, I still like it and instead of retiring it, I've been slowly trying to start using the features that make it awesome, while mitigating some of the problems that have piled up over the years.
treis|6 years ago
The problem I had with CouchDB is integrating it into a framework like Rails. CouchDB on its own does so much cool stuff. The "free" HTTP API and client replication via PouchDB are the two huge ones. But it just wasn't smooth enough to get the data out, use it where I wanted, and then save it back.
xwowsersx|6 years ago
[deleted]
newfeatureok|6 years ago
Firebase is the gold-standard for offline apps (as a service). CouchDB replaces Cloud Firestore, and Keycloak replaces Authentication. I haven't seen OSS equivalents of Cloud Functions, ML Kit, and the other things (e.g. In-App messaging, and Cloud Messaging). It'd be nice to have the entire stack of Firebase bundled as a group of OSS projects, including CouchDB.
Sad to see that per doc access control didn't make it in 3.0. Hopefully it'll be in 3.1.
Graphguy|6 years ago
(1) https://www.ibm.com/cloud/blog/announcements/announcing-supp... (2)https://github.com/apache/openwhisk/blob/master/tools/db/REA...
matlin|6 years ago
So shameless plug if you're interested in signing up for the alpha: https://www.aspen.cloud
WorldMaker|6 years ago
Every now and again I wonder if I could implement the CouchDB Replication Protocol on top of Cosmos DB with a presumably hairy ball of Azure Functions and hoping someone beats me to needing that to exist and scratches that itch for me. (Cosmos DB's changes feed is so almost right for the job it hurts because it sounds like it should be easy, and yet I assume it won't be.)
unknown|6 years ago
[deleted]
moenzuel|6 years ago
pachico|6 years ago
kache_|6 years ago
chasers|6 years ago
smoyer|6 years ago
[1] In this case, a JSON document but either with a JSON-schema or marshaled/unmarshaled into a strict type.
jimstr|6 years ago
Thanks!
karmelapple|6 years ago
knubie|6 years ago
code-is-code|6 years ago
https://github.com/pubkey/rxdb
jfkebwjsbx|6 years ago
PL/SQL also allowed (and allows) you to create entire apps within a database.
Phillips126|6 years ago
I used it years ago when I was experimenting with Ionic[0]. What appealed to me was that I could use CouchDB (cloud) and PouchDB[1] (device) to and have a replicated copy of the data locally. The application was used in areas where network connection was very limited. Using this strategy I was able to ensure the mobile devices data was as recent as the last time it had a network connection.
[0] - https://ionicframework.com/
[1] - https://pouchdb.com/
lytefm|6 years ago
For user authentication I've forked the nowadays unmaintained superlogin package [1], which still does a great job when keeping the dependencies up to date.
[1] https://github.com/LyteFM/superlogin
hajile|6 years ago
For those interested, looks like the guts of CouchDB are going to be swapped out for FoundationDB.
https://blog.couchdb.org/2020/02/26/the-road-to-couchdb-3-0-...
newfeatureok|6 years ago
splatcollision|6 years ago
Updates on huge docs would be painful!
yyyk|6 years ago
lytefm|6 years ago
The IBM Cloudant free tier only allows Doxs up to 1 MB.
So this doesn't really come as a surprise or feel hyper restrive to me.
AtlasBarfed|6 years ago
johnchristopher|6 years ago
Yes ^^ !
Congrats to the team. These people are some of the nicest and most supportive devs I know of in the OSS community (or whatev').
They show a great deal of patience in their slack channel and are always welcoming and answering stupid questions from idiots like me.
janl|6 years ago
tbrock|6 years ago
Seriously asking...
Over the past 5 years MongoDB has gotten a great storage engine, transactions, distributed transactions, multi master replication, first class change streams and is very very solid as a foundational piece of infrastructure you can rely on while CouchDB has languished. I can’t imagine reaching for it in my tool belt when I need a document store over MongoDB but I’m obviously biased so I’m wondering if there is a lot I’m missing.
Obviously it’s cool from a more open source databases standpoint — I love learning about how things are built and evolve over time.
pritambaral|6 years ago
2. MongoDB's design has historically been terrible; and, from my current experience with clients, is still a source of 'WTF's.
newfeatureok|6 years ago
almery|6 years ago
Quarrelsome|6 years ago
As silly as that sounds as a reason to choose CouchDB it demonstrates where the respective company's priorities lie.
liamdiprose|6 years ago
> The Couch Replication Protocol lets your data flow seamlessly between server clusters to mobile phones and web browsers, enabling a compelling offline-first user-experience
speedgoose|6 years ago
CouchDB has another pattern, each master is really a master and you can have live replication but also offline replication. You can connect two clusters every new moon and they will synchronize. For sure the clients may have to deal with potential conflicts but in practice it's very neat and that's what makes couchdb worth it if you need this feature.
freeqaz|6 years ago
CA databases: SQL databases that are hard to scale ("partition") but are always consistent and available.
CP databases: MongoDB style databases that are consistent and partition tolerant, but trade availability (sometimes your queries will fail during high load).
AP databases: CouchDB style databases. They are always available and are partition tolerant, but you may be querying stale data.
[0]: https://en.wikipedia.org/wiki/CAP_theorem?wprov=sfla1
janl|6 years ago
They are very different databases. But since they have come up are around the same time and because they look very similar on the surface, you might think you chose between them.
But when you look more closely at detail decisions on the technical details, at almost every point, where CouchDB goes one way, Mongo went the other way.
I’m not saying either decisions are better or worse, it’s just that they are very different database that you should evaluate on their merits, not just superficially.
anonyfox|6 years ago
pawelk|6 years ago
crudbug|6 years ago
Looking forward to CouchDB 4.0/FoundationDB goodies. Do we have any roadmap details on this.
e12e|6 years ago
> Default installations are now secure and locked down.
More good news!
Anyone have recent experience with couchdb?
I see the (quickstart) docs use plain http - should one terminate ssl in front, eg with a recent version of haproxy?
mauflows|6 years ago
janl|6 years ago
CouchDB does SSL natively, but we do recommend HAProxy.
Graphguy|6 years ago
couchdb_ouchdb|6 years ago
I joined a company where it's being used backing a mobile app with couch/pouch in production. We can't wait to get off of it. Writes are slow. Reads are worse. Having a DB per user is a scaling and backup nightmare. If you run into any issues, it's a ghost town.
I'm glad the CouchDB Team is forging ahead, but who is really using this database?
janl|6 years ago
OTOH, publicly known big companies using CouchDB include Apple and IBM.
And I worked on a team that used CouchDB’s offline capability in the 2015 Ebola crisis in West Africa. That work also lead to the first Ebola vaccine ever.
That’s why we do CouchDB :)
staticautomatic|6 years ago
yatsyk|6 years ago
Volundr|6 years ago
One you implement validation functions [1] on user databases to control what kind of data can be inserted into couch. These functions can only be changed by database admins, not users, so can act as a security mechanism controlling what goes in.
As mentioned by others you can also implement a proxy. This doesn't have to interfere with sync functionality, you just have to make sure you proxy all the endpoints in the replication protocol [2]. Envoy [3] is one such proxy that essentially applies document level permissions to a CouchDB database without interfering with sync.
If the goal is just to limit document size, or throttle clients trying to hammer the API, this doesn't even have to be a custom proxy, and reverse proxy with the needed control knobs (such as NGINX) will do. You can of course combine this with validation functions, using validations to ensure the everything that comes in is the right "shape" and using NGINX and it's ilk to apply throttling and sane request limits.
At scale there's a decent chance you want a proxy in front of your Couch instance anyway, since Couch is truly multi-master, meaning you probably want to balance your clients across all your nodes anyway.
[1] https://docs.couchdb.org/en/stable/ddocs/ddocs.html#validate... [2] https://docs.couchdb.org/en/stable/replication/protocol.html... [3] https://github.com/cloudant-labs/envoy
newfeatureok|6 years ago
For your example specifically I'd use a proxy.
fiatjaf|6 years ago
CouchDB was a simple but very powerful idea (that still needed improvements), but it was coopted into something not very nice nor good nor useful.
See my old rant about it and why it failed: http://web.archive.org/web/20170530122143/http://entulho.fia...
LoSboccacc|6 years ago
I'm wondering how people solve the common search after create pattern when using external indexes
janl|6 years ago
haolez|6 years ago
CameronNemo|6 years ago
I have never done this with CouchDB, but the technique is described in Martin Kleppman's __Designing Data Intensive Applications__.
janl|6 years ago
We are working on per-document-access-control at the moment, to support this use-case out of the box
gigatexal|6 years ago
This is a great release too!
agumonkey|6 years ago
gtirloni|6 years ago
Volundr|6 years ago
mark_l_watson|6 years ago
james_s_tayler|6 years ago
Some are no longer maintained. Some still work.
mikekchar|6 years ago
seigel|6 years ago
canada_dry|6 years ago
To clarify: seems their main site is on apache.org. But, their www.couchdb.org site (hosted on uberspace.de) doesn't have a correct cert.
lars_francke|6 years ago
Hedja|6 years ago
Maybe you encountered a bug where it served the wrong cert for a different batch of custom domains.
unknown|6 years ago
[deleted]
canada_dry|6 years ago
wildchild|6 years ago
[deleted]