I was expecting a post describing why MongoDB specifically weren't fit to their use case, but the TL;DR version is basically:
"Before you get too excited, the reason for the failure is probably not any of the ones you're imagining. Mainly it's this: adding another kind of production database was a huge waste of time."
The blog title is misleading IMO. It could as well be titled "Why [any other DBMS] Never Worked Out at Etsy" and the conclusion would be the same.
Your proposed title is very confusing and meta. The current title is only misleading if you are going into it with biased expectations about what it's going to teach you. The article is interesting precisely because it is not just another "NoSQL sucks" screed.
Soundcloud wrote a blog post http://backstage.soundcloud.com/2011/04/failing-with-mongodb... about the specifics of how they failed in implementing an analytics platform in MongoDB, then went with Cassandra (and why). I have been at a company where it was on developers to deploy mongo clusters, setting up logging can suck, but still there are no numbers or even application integration specifics here. As someone pointed out - manual denormilization can suck. There are options like MongoHQ and Heroku though so this shouldn't resolve to "don't try a new data store its hard and possibly buggy".
Genuine question- who is using MongoDB successfully in production, and at scale? I'm not aware of anyone myself- I hear of it being used in hackathons etc because its so quick to set up, but I'd be curious to know what people are using it with.
I run two sites, one is the perfect use-case for MongoDB - http://www.AUsedCar.com , it's a used car search engine. We've seen nothing but benefits by switching to it from MS SQL Server. Queries are way faster etc... It's a great use case because 99.9% of DB interactions are read-only searches.
My other site, http://www.BudgetSimple.com on the other hand is using SQL Server (in the process of porting to MySQL). It would not be a great use-case for Mongo, because there are usually as many update, delete, inserts as there are reads, and instant database integrity and a schema are important.
Anyone that claims a tool is perfect for every problem is probably wrong. You need to figure out the best one for your use case, and load test, security test, performance test, etc... until you have a good guess for the right answer.
I'm using it for the analytics suite at my company (large ecommerce multinational).
Its weird at first, coming from a background of Access then SQLite then MySQL/PHPMyAdmin but you get used to it. I essentially treat it like a gigantic python dictionary object.
The sharding is too much of a ball-ache to set up so I've created an optimal way of distributing/mapping files across our cluster to make use of all machines.
Data integration is nice. Making sure there's no temptation to output each integrated line to the terminal, pymongo and its C extensions can integrate a ~500 byte record in ~0.0001 seconds.
Basically the main advantage is not having a schema whatsoever - you can just add random attributes to documents whenever the hell you want. But later you have to be careful with exception handling since documents might not have the attributes you expect.
CERN. We got too excited and started using it for EVERYTHING (it started just being part of the LHC data analyzing project) and it didn't work in some cases, but for some projects it fitted in perfectly.
We have a very large MongoDB installation running in production and at scale, and it works pretty well.
That said, 99% of our production issues involve bugs in MongoDB and it's inability to effectively use all available resources before it becomes unresponsive. I would say it needs a few more generations to become truly solid.
We're using it for Webpop (http://www.webpop.com) and have generally been very happy with it.
We were very well aware of its characteristics when choosing our DB, and didn't go in expecting any magic Web Scale or somehow getting a HA setup with plenty of durability with just one server.
For a multitenant CMS where you want to store documents with custom schemas, need more than just a key/value store and want some capability to do ad-hoc queries against custom fields, MongoDB is a pretty good fit.
The bottom line is that MongoDB and MySQL are two different persistent data structures. MySQL is a more powerful data structure that can do more things. MongoDB is less powerful, but is more efficient at certain things. Due to pre-mature optimization or shortsightedness, some folks are romanticized with the efficiency of a less powerful data structure (MongoDB) and fail to realize that their application really need the more powerful relational data structure.
These things should be good learning examples for all.
This also makes it sound like whoever intervened to rewrite said feature in sharded mysql had an easy time. Usually this would not be an obvious port. However we don't know the technical nature of the feature or specifically why it failed.
This is a really good point that doesn't bring up any direct slams against a particular tool; +1 to the author for that.
I've found using Mongo as a stop-gap for consuming JSON APIs extremely useful. You could probably s/Mongo/{nosqldb} there since it's nothing earth shattering.
However, as the only tech guy in our startup I'm always looking harder at Redis than Mongo for most of the problems for which a NoSQL solution might be tempting. I've recently had a lot of success with JSON in Postgres and knowing HStore is always there if I need it has firmly cemented my opinion that I don't need a separate NoSQL solution (yet). (Of course I am merely persisting data in JSON format- not querying on it).
"Why MongoDB Never Worked Out Two Years Ago When We Tried to Run It For Our First Time For One Feature, And Beside Another Database Which We Really Considered Production."
I've seen and used MongoDB on multiple projects, big and small, and it's fine. It's a database that stores data. Use it for that purpose and you will be ok.
You didn't read the article because the point was that the lesson learned was that if you are going to have two data stores the human tendency is for one to be a second class citizen with regards to support by ops, etc.
This is totally reasonable. MongoDB, more than any other "NoSQL" database, directly competes with MySQL/Postgres as a general-purpose application database. I don't see a need to have more than one for most applications - at least as long as there is only one development/support team for that application.
Perhaps one reason is that more hosting providers support MySQL but not PostgreSQL. Amazon AWS, Google Cloud SQL, and numerous others offer hosted MySQL solutions but not PostgreSQL. I'm unsure how much overall usage such service providers account for though; it would be an interesting stat.
I'm surprised to hear this coming from Etsy, a place I thought of as doing deployment right.
All these things should be simple. You already have (or should have) a unified system for dealing with logging/monitoring/graphing/init scripts/backup across multiple services that are far more different from each other than they are from mongodb (Sharding strategy and slow queries are probably an application-level concern). It shouldn't be hard - in fact it should be trivial - to add one more service. At last.fm (disclaimer: my experience was brief and getting on for two years ago) it felt like we were running every database under the sun, but we had a unified system for doing deployment/monitoring/everything, so it was no bother to add one more if an application wanted it.
Misleading article tile's summary: We tried to use a technology that was less mature than another technology. We had to figure some stuff out that had already been figured out on the more mature technology. Using two technologies was more complicated than using one.
Except, as others have said, that is not what this article is saying at all. That said, your comment seems to imply that you just read the headline, and (perhaps understandably) didn't actually read the article.
I've learned, especially on HN, that article titles can be extremely misleading.
I think the real issue here is that most people don't understand /how/ to use MongoDB.
The best use case for MongoDB is as a document store. I can essentially cache numerous MySQL requests into a compiled set of useful information. Especially if the information changes somewhat infrequently, then instead of running MySQL requests for every page load I can pull the information from MongoDB. In most cases when I use MongoDB, its not as a persistent data store, but as a "compiled" data store.
MongoDB also has some useful set operations.
I for one don't believe that MongoDB is /directly/ competing with MySQL, Postgres, etc. but rather enhances these databases.
Whenever I see articles come up like this one mentioning MongoDB, I wonder not why people decided to go with Mongo, but why they didn't go with some of the alternatives out there? For my part, we use Couchbase to great success and it fixes many of the complaints against MongoDB. Then there's Riak and countless others with well established quality installations. To me MongoDB seems the buzzword NoSQL engine that gets used for 'play' projects, but not much in the way of real-world implementations. Thoughts?
I do not see any of those other NoSQL databases as really being equivalent. MongoDB intends to be a general-purpose application database. It has many of the features developers expect from MySQL/Postgres, such as arbitrary numbers of indexed fields, partial record updates, aggregation queries (simpler than Map/Reduce) and many others. Couchbase may be much closer in feature-set but its developers claim they do not really compete with Mongo.
I do not see Riak or Cassandra as competing at all. In fact I would expect most applications that use Riak or Cassandra are also using a general-purpose database as well (such as MySQL or Mongo). You could use some of those databases as a general purpose database but it would be more work for little benefit. It makes more sense to me to use Riak or Cassandra for use-cases that really need high-throughput and unlimited write-scalability and use an app database for things like user accounts and preference management and all the little things that can take up a lot of development time but will never have really demanding runtime requirements (for 99.99% of internet apps).
So, if someone asked you why they should use "Riak and countless others" and not MongoDB, what would you really say? Also, you seemed to imply that Riak was a go-to solution (my wording) while implying that MongoDB was more of a fringe "buzzword" technology ... when, counting features, I think most would acknowledge MongoDB as being more mainstream.
There are a large number of well-established and quality installations of MongoDB. It works really well at both small and large scale and with a bit of tweaking (like any technology), can perform nicely.
I'm so glad to see this post. I remember having a conversation with someone from Etsy at one point and they made an offhand comment about MongoDB having been a terrible idea with a hint that there was a longer story to it than we had time for. I've been curious about the story ever since.
[+] [-] bonobo|13 years ago|reply
"Before you get too excited, the reason for the failure is probably not any of the ones you're imagining. Mainly it's this: adding another kind of production database was a huge waste of time."
The blog title is misleading IMO. It could as well be titled "Why [any other DBMS] Never Worked Out at Etsy" and the conclusion would be the same.
[+] [-] gfodor|13 years ago|reply
[+] [-] thefreeman|13 years ago|reply
[+] [-] lincolnbryant|13 years ago|reply
[+] [-] untog|13 years ago|reply
[+] [-] ry0ohki|13 years ago|reply
My other site, http://www.BudgetSimple.com on the other hand is using SQL Server (in the process of porting to MySQL). It would not be a great use-case for Mongo, because there are usually as many update, delete, inserts as there are reads, and instant database integrity and a schema are important.
Anyone that claims a tool is perfect for every problem is probably wrong. You need to figure out the best one for your use case, and load test, security test, performance test, etc... until you have a good guess for the right answer.
[+] [-] matthewrnyc|13 years ago|reply
Its weird at first, coming from a background of Access then SQLite then MySQL/PHPMyAdmin but you get used to it. I essentially treat it like a gigantic python dictionary object.
The sharding is too much of a ball-ache to set up so I've created an optimal way of distributing/mapping files across our cluster to make use of all machines.
Data integration is nice. Making sure there's no temptation to output each integrated line to the terminal, pymongo and its C extensions can integrate a ~500 byte record in ~0.0001 seconds.
Basically the main advantage is not having a schema whatsoever - you can just add random attributes to documents whenever the hell you want. But later you have to be careful with exception handling since documents might not have the attributes you expect.
[+] [-] bonobo|13 years ago|reply
[1] https://www.youtube.com/watch?v=GBauy0o-Wzs
[2] https://www.youtube.com/watch?v=RkPmVQNesZA
[+] [-] technoweenie|13 years ago|reply
MongoDB Counters: http://railstips.org/blog/archives/2011/06/28/counters-every... http://railstips.org/blog/archives/2011/07/31/counters-every...
Kestrel: http://railstips.org/blog/archives/2012/03/05/misleading-tit...
Gets some decent traffic and works well.
http://twitter.com/jnunemaker/status/282141258816827393
[+] [-] dindresto|13 years ago|reply
[+] [-] eLobato|13 years ago|reply
http://lanyrd.com/2012/mongonyc/stqzc/
[+] [-] nasalgoat|13 years ago|reply
That said, 99% of our production issues involve bugs in MongoDB and it's inability to effectively use all available resources before it becomes unresponsive. I would say it needs a few more generations to become truly solid.
[+] [-] patrickod|13 years ago|reply
[+] [-] kennystone|13 years ago|reply
[+] [-] bobfunk|13 years ago|reply
We were very well aware of its characteristics when choosing our DB, and didn't go in expecting any magic Web Scale or somehow getting a HA setup with plenty of durability with just one server.
For a multitenant CMS where you want to store documents with custom schemas, need more than just a key/value store and want some capability to do ad-hoc queries against custom fields, MongoDB is a pretty good fit.
[+] [-] jshen|13 years ago|reply
[+] [-] unknown|13 years ago|reply
[deleted]
[+] [-] vph|13 years ago|reply
These things should be good learning examples for all.
[+] [-] lincolnbryant|13 years ago|reply
[+] [-] leetrout|13 years ago|reply
I've found using Mongo as a stop-gap for consuming JSON APIs extremely useful. You could probably s/Mongo/{nosqldb} there since it's nothing earth shattering.
However, as the only tech guy in our startup I'm always looking harder at Redis than Mongo for most of the problems for which a NoSQL solution might be tempting. I've recently had a lot of success with JSON in Postgres and knowing HStore is always there if I need it has firmly cemented my opinion that I don't need a separate NoSQL solution (yet). (Of course I am merely persisting data in JSON format- not querying on it).
[+] [-] darrencauthon|13 years ago|reply
"Why MongoDB Never Worked Out Two Years Ago When We Tried to Run It For Our First Time For One Feature, And Beside Another Database Which We Really Considered Production."
I've seen and used MongoDB on multiple projects, big and small, and it's fine. It's a database that stores data. Use it for that purpose and you will be ok.
[+] [-] gfodor|13 years ago|reply
[+] [-] jeremyjh|13 years ago|reply
[+] [-] monstrado|13 years ago|reply
[+] [-] francesca|13 years ago|reply
Also foursquare runs a very large MongoDB deployment. http://www.10gen.com/presentations/mongodb-foursquare-cloud-...
Craigslist: http://www.10gen.com/customers/craigslist
Shutterfly also has a very large deployment: http://www.10gen.com/customers/shutterfly
[+] [-] stesch|13 years ago|reply
[+] [-] mminer|13 years ago|reply
[+] [-] lmm|13 years ago|reply
All these things should be simple. You already have (or should have) a unified system for dealing with logging/monitoring/graphing/init scripts/backup across multiple services that are far more different from each other than they are from mongodb (Sharding strategy and slow queries are probably an application-level concern). It shouldn't be hard - in fact it should be trivial - to add one more service. At last.fm (disclaimer: my experience was brief and getting on for two years ago) it felt like we were running every database under the sun, but we had a unified system for doing deployment/monitoring/everything, so it was no bother to add one more if an application wanted it.
[+] [-] mrinterweb|13 years ago|reply
[+] [-] emperorcezar|13 years ago|reply
[+] [-] aroman|13 years ago|reply
I've learned, especially on HN, that article titles can be extremely misleading.
[+] [-] nickaknudson|13 years ago|reply
The best use case for MongoDB is as a document store. I can essentially cache numerous MySQL requests into a compiled set of useful information. Especially if the information changes somewhat infrequently, then instead of running MySQL requests for every page load I can pull the information from MongoDB. In most cases when I use MongoDB, its not as a persistent data store, but as a "compiled" data store.
MongoDB also has some useful set operations.
I for one don't believe that MongoDB is /directly/ competing with MySQL, Postgres, etc. but rather enhances these databases.
[+] [-] druiid|13 years ago|reply
[+] [-] jeremyjh|13 years ago|reply
I do not see Riak or Cassandra as competing at all. In fact I would expect most applications that use Riak or Cassandra are also using a general-purpose database as well (such as MySQL or Mongo). You could use some of those databases as a general purpose database but it would be more work for little benefit. It makes more sense to me to use Riak or Cassandra for use-cases that really need high-throughput and unlimited write-scalability and use an app database for things like user accounts and preference management and all the little things that can take up a lot of development time but will never have really demanding runtime requirements (for 99.99% of internet apps).
[+] [-] jasonmccay|13 years ago|reply
There are a large number of well-established and quality installations of MongoDB. It works really well at both small and large scale and with a bit of tweaking (like any technology), can perform nicely.
[+] [-] ddorian43|13 years ago|reply
Range sharding (for saas,shard by client_id).
No sorting by value on couchbase indexes? And many other small features.
On the other hand i love about couchbase: no mongos,all servers equal.
[+] [-] darkxanthos|13 years ago|reply
[+] [-] tobyjsullivan|13 years ago|reply
Finally, some closure!
[+] [-] unknown|13 years ago|reply
[deleted]