Don't use MongoDB

[+] harryh|14 years ago|reply

Hi,

I run engineering for foursquare. About a year and a half ago my colleagues and I and made the decision to migrate to MongoDB for our primary data store. Currently we have dozens of MongoDB instances across several different data clusters storing over a TB of data and handling 10s of thousands of requests per second (mostly reads but the write load is reasonably high as well).

Have we run into problems with MongoDB along the way? Yes, of course we have. It is a new technology and problems happen.

Have they been problematic enough to seriously threaten our data? No they have not.

Has Eliot and the rest of his staff @ 10Gen been extremely responsive and helpful whenever we run into problems? Yes, absolutely. Their level of support is amazing.

MongoDB is a complicated beast (as are most datastores). It makes tradeoffs that you need to understand when thinking about using it. It's not necessarily for everyone. But it most certainly can be used by serious companies building serious products. Foursquare is proof of that.

I'm happy to answer any questions about our experience that the HN community might have.

-harryh

[+] samstokes|14 years ago|reply

Would you be able to sum up the things you consider Mongo to be extremely good at? Particularly in comparison to things like Riak (which I believe supports a similar data model), or indeed compared to an RDBMS.

All databases perform poorly if you try to use them for use cases they don't fit, but I find with NoSQL databases it can be hard to find concise, objective statements of which use cases each is ideal for.

[+] fedd|14 years ago|reply

have users of foursquare run into problems? were they serious? did someone lose money? let's ask. it would answer whether to use an eventually consistent db.

[+] nmongo|14 years ago|reply

[deleted]

[+] antirez|14 years ago|reply

I appreciate the "public service" intend of this blog post, however:

1) It is wrong to evaluate a system for bugs now fixed (but you can evaluate a software development process this way, however it is not the same as MongoDB itself, since the latter got fixed).

2) A few of the problems claimed are hard to verify, like subsystems crashing, but users can verify or deny this just looking at the mailing list if MongoDB has a mailing list like the Redis one that is ran by an external company (google) and people outside 10 gen have the ability to moderate messages. (For instance in Redis two guys from Citrusbytes can look/moderate messages, so even if I and Pieter would like to remove a message that is bad advertising we can't in a deterministic way).

3) New systems fails, especially if they are developed in the current NoSQL arena that is of course also full of interests about winning users ASAP (in other words to push new features fast is so important that perhaps sometimes stability will suffer). I can see this myself as even if my group at VMware is very focused on telling me to ship Redis as stable as possible as first rule, sometimes I get pressures about releasing new stuff ASAP from the user base itself.

IMHO it is a good idea if programmers learn to test very well the systems they are going to use with simulations for the intended use case. Never listen to the Hype, nor to detractors.

On the other side all this stories keep me motivated in being conservative in the development of Redis and try avoiding bloats and things I think will ultimately suck in the context of Redis (like VM and diskstore, two projects I abandoned).

[+] moe|14 years ago|reply

1) It is wrong to evaluate a system for bugs now fixed

I disagree. A project's errata is a very good indicator for the overall quality of the code and the team. If a database-systems history is littered with deadlock, data-corruption and data-loss bugs up to the present day then that's telling a story.

2) A few of the problems claimed are hard to verify

The particular bugs mentioned in an anonymous pastie may be hard to verify. However, the number of elaborate horror-stories from independent sources adds up.

3) New systems fails, especially if they are developed in the current NoSQL arena

Bullshit. You, personally, are demonstrating the opposite with redis which is about the same age as MongoDB (~2 years).

[+] gfodor|14 years ago|reply

At the end of the post the author notes his concern isn't with the technical bugs per se, but with the deep rooted cultural problems and misplaced priorities the existence of those problems reveal.

[+] rdtsc|14 years ago|reply

> IMHO it is a good idea if programmers learn to test very well the systems they are going to use ...

Great point. It would also help if the company that makes a DB would put flashing banner on their page to explain the trade-offs in their product. Such as "we don't have single server durability built in as a default".

I understand if they are selling dietary supplements and are touting how users will acquire magic properties for trying the product for 5 easy payments of $29.99. In other words I expect shady bogus claims there. But these people are marketing software, not to end users, but to other developers. A little honesty, won't hurt. It is not bad that they had durability turned off. It is just a choice, and it is fine. What is not fine is not making that clear on the front page.

[+] christkv|14 years ago|reply

It's good to see a voice of reason. I think we all win if NoSQL is allowed to survive. Having multiple paths to modeling and designing our applications is an enrichment of our ability to create interesting and valuable applications in our industry. The last 10 years have been about living under the modeling constraints of RDBMS's and the industry is slowly waking up to the realization that it does not need to be like this. Now we got choices. Graph db's, column db's, document db's etc.

I would like to thank you for the great job you have and are doing on Redis. It's an awesome piece of technology and warms my heart as an European :). Are you based in Palermo ?

[+] nmongo|14 years ago|reply

[deleted]

[+] nmongo|14 years ago|reply

[deleted]

[+] foobarbazetc|14 years ago|reply

No shit, nmongo.

Anyone with half a brain can go look at the MongoDB codebase and deduce that it's amateur hour.

It's start up quality code but it's supposed to keep your data safe. That's pretty much the issue here -- "cultural problems" is just another way of saying the same thing.

Compare the code base of something like PostgreSQL to Mongo, and you'll see how a real database should be coded. Even MySQL looks like it's written by the world's best programmers compared to Mongo.

I'm not trying to hate on Mongo or their programmers here, but you've basically paid the price for falling for HN hype.

Most RDBMSes have been around for 10+ years, so it's going to take a long, long time for Mongo to catch up in quality. But it won't, because once you start removing the write lock and all the other easy wins, you're going to hit the same problems that people solved 30 years ago, and your request rates are going to fall to memory/spindle speed.

Nothing's free.

[+] LeafStorm|14 years ago|reply

Just for comparison, CouchDB has had one major bug that could cause the loss of data, detailed here: http://couchdb.apache.org/notice/1.0.1.html

The bug was only triggered when the delayed_commits option was on (holds off on fsyncing when lots of write operations are coming in) and there was both a write conflict and a period of inactivity - when the database was shut down, any writes that happened afterwards would not be saved.

They immediately worked to develop a process that would prevent any data from being lost if you didn't shut down the server, then a week later had released an emergency bugfix version without the bug. Then later they released a tool that could recover any data lost from the bug if the database hadn't been compacted.

That's the kind of attitude database developers need to have towards data integrity.

[+] itaborai83|14 years ago|reply

One of the things that I love about Couch is that the standard way to shutdown the process is simply doing a kill -9 on the server process. No data loss. No Worries. Want to back up your data? rsync it and be done with it.

Couch may have its warts, but it is damn reliable.

[+] nmongo|14 years ago|reply

[deleted]

[+] latch|14 years ago|reply

There's a lot of anonymity going on here. A new HN account, an unknown company and product, and claims with no evidence.

Why are't links to 10gen's Jira provided? Where's the test code that shows the problems they had with the write lock?

This is an extremely shallow analysis.

[+] nomoremongo|14 years ago|reply

Pastebin author here.

Refutations are going to fall into two categories, it seems:

1. Questioning my honesty

2. Questioning my competence

Re #1, I'm not sure what you imagine my incentive to lie might be. I honestly just intended this to benefit the community, nothing more. I'm genuinely troubled that it might cause some problems for 10gen, b/c, again, Eliot & co are nice people.

Re #2, all I can do is attempt to reassure you we're generally smart and capable fellows. For example, these same systems exhibit none of these problems, and we're sleeping quite well through the night, on the new database system they've moved to. I'll omit the name of the database system just so there is no conflict that might undermine my integrity and motives (see #1).

edit:

(also, there are a few comments about "someone unknown/new around here"... trust me, I'm not new or unknown. I'm a regular.)

[+] jonpaul|14 years ago|reply

I've used MongoDB in production since the 1.4 days. It should be noted that my apps are NOT write heavy. But, many of the author's points can be refuted by using version 2.0.

Regarding the point of using getLastError(), the author is completely correct. But the problem is not so much that MongoDB isn't good, it's that developers start using it and expect it to behave like a relational DB. Start thinking in an asynchronous programming paradigm, and you'll have less problems.

I got bit my MongoDB early on. When my server crashed, I learned real quickly what fsync, journaling, and friends can do. The best thing a dev can do before using MongoDB is to RTFM and understand its implications.

The #1 reason that I used MongoDB, was because of the schema-less models. That's it. Early on in an applications life-cycle, the data model changes so frequently that I find migrations painful and unnecessary.

My two cents, hopefully it helps.

[+] CarlHoerberg|14 years ago|reply

Schema-less is imho a overrated feature. ORMs like DataMapper (Ruby) and NHibernate (.NET) can generate the schema on the fly for RMDBS, so no need for migrations pre-production. But when your application is in production you need migrations even with a "schema-less" db! See, rename a field and "all your data" is lost, unless you migrate the data from the old field to the new one..

[+] kfool|14 years ago|reply

> "because of the schema-less models."

> "I find migrations painful and unnecessary."

A schema-less model neither makes a migration less painful nor eliminates it.

In MongoDB, what did you do when the data model changed?

[+] nmongo|14 years ago|reply

[deleted]

[+] electic|14 years ago|reply

We extensively tested this inside Viralheat with a write heavy load of over 30,000 writes per second and basically it failed our test. It is not robust for the analytics world is the conclusion we came to. Though, I hope it gets better one day...it has potential.

[+] jacques_chester|14 years ago|reply

Have you talked to Greenplum? They have a postgres derivative that can cope with Yahoo's clickstream data.

(I am not affiliated with either of them).

[+] latch|14 years ago|reply

What was the problem? Performance? I'm not sure what robust means to other people, but to me it implies crashes and data-issues.

[+] jdagostino|14 years ago|reply

what did you end up going with?

[+] nmongo|14 years ago|reply

[deleted]

[+] nikcub|14 years ago|reply

Links about Foursquare's problems with MongoDB. The site was down for a while when their 1.6 instance crashed:

* http://blog.foursquare.com/2010/10/05/so-that-was-a-bummer/

* http://www.infoq.com/news/2010/10/4square_mongodb_outage

* http://groups.google.com/group/mongodb-user/browse_thread/th...

I like MongoDB, it is easy to setup, work with and to understand. I think it has an opportunity to become the mysql of nosql (in more ways than one)

Foursquare and 10gen (the makers of MongoDB) share USV as an investor.

[+] ehwizard|14 years ago|reply

From CTO of 10gen

First, I tried to find any client of ours with a track record like this and have been unsuccessful. I personally have looked at every single customer case that’s every come in (there are about 1600 of them) and cannot match this story to any of them. I am confused as to the origin here, so answers cannot be complete in some cases.

Some comments below, but the most important thing I wanted to say is if you have an issue with MongoDB please reach out so that we can help. https://groups.google.com/group/mongodb-user is the support forum, or try the IRC channel.

> 1. MongoDB issues writes in unsafe ways by default in order to win benchmarks

The reason for this has absolutely nothing to do with benchmarks, and everything to do with the original API design and what we were trying to do with it. To be fair, the uses of MongoDB have shifted a great deal since then, so perhaps the defaults could change.

The philosophy is to give the driver and the user fine grained control over acknowledgement of write completions. Not all writes are created equal, and it makes sense to be able to check on writes in different ways. For example with replica sets, you can do things like “don’t acknowledge this write until its on nodes in at least 2 data centers.”

> 2. MongoDB can lose data in many startling ways

> 1. They just disappeared sometimes. Cause unknown.

There has never been a case of a record disappearing that we either have not been able to trace to a bug that was fixed immediately, or other environmental issues. If you can link to a case number, we can at least try to understand or explain what happened. Clearly a case like this would be incredibly serious, and if this did happen to you I hope you told us and if you did, we were able to understand and fix immediately.

> 2. Recovery on corrupt database was not successful, pre transaction log.

This is expected, repairing was generally meant for single servers, which itself is not recommended without journaling. If a secondary crashes without journaling, you should resync it from the primary. As an FYI, journaling is the default and almost always used in v2.0.

> 3. Replication between master and slave had gaps in the oplogs, causing slaves to be missing records the master had. Yes, there is no checksum, and yes, the replication status had the slaves current

Do you have the case number? I do not see a case where this happened, but if true would obviously be a critical bug.

> 4. Replication just stops sometimes, without error. Monitor > your replication status!

If you mean that an error condition can occur without issuing errors to a client, then yes, this is possible. If you want verification that replication is working at write time, you can do it with w=2 getLastError parameter.

> 3. MongoDB requires a global write lock to issue any write

> Under a write-heavy load, this will kill you. If you run a blog, you maybe don't care b/c your R:W ratio is so high.

The read/write lock is definitely an issue, but a lot of progress made and more to come. 2.0 introduced better yielding, reducing the scenarios where locks are held through slow IO operations. 2.2 will continue the yielding improvements and introduce finer grained concurrency.

> 4. MongoDB's sharding doesn't work that well under load

> Adding a shard under heavy load is a nightmare. Mongo either moves chunks between shards so quickly it DOSes the production traffic, or refuses to more chunks altogether.

Once a system is at or exceeding its capacity, moving data off is of course going to be hard. I talk about this in every single presentation I’ve ever given about sharding[0]: do no wait too long to add capacity. If you try to add capacity to a system at 100% utilization, it is not going to work.

> 5. mongos is unreliable

> The mongod/config server/mongos architecture is actually pretty reasonable and clever. Unfortunately, mongos is complete garbage. Under load, it crashed anywhere from every few hours to every few days. Restart supervision didn't always help b/c sometimes it would throw some assertion that would bail out a critical thread, but the process would stay running. Double fail.

I know of no such critical thread, can you send more details?

> 6. MongoDB actually once deleted the entire dataset

> MongoDB, 1.6, in replica set configuration, would sometimes determine the wrong node (often an empty node) was the freshest copy of the data available. It would then DELETE ALL THE DATA ON THE REPLICA (which may have been the 700GB of good data)

> They fixed this in 1.8, thank god.

Cannot find any relevant client issue, case nor commit. Can you please send something that we can look at?

> 7. Things were shipped that should have never been shipped

> Things with known, embarrassing bugs that could cause data problems were in "stable" releases--and often we weren't told about these issues until after they bit us, and then only b/c we had a super duper crazy platinum support contract with 10gen.

There is no crazy platinum contract and every issue we every find is put into the public jira. Every fix we make is public. Fixes have cases which are public. Without specifics, this is incredibly hard to discuss. When we do fix bugs we will try to get to users as fast as possible.

> 8. Replication was lackluster on busy servers

This simply sounds like a case of an overloaded server. I mentioned before, but if you want guaranteed replication, use w=2 form of getLastError.

> But, the real problem:

> 1. Don't lose data, be very deterministic with data

> 2. Employ practices to stay available

> 3. Multi-node scalability

> 4. Minimize latency at 99% and 95%

> 5. Raw req/s per resource

> 10gen's order seems to be, #5, then everything else in some order. #1 ain't in the top 3.

This is simply not true. Look at commits, look at what fixes we have made when. We have never shipped a release with a secret bug or anything remotely close to that and then secretly told certain clients. To be honest, if we were focused on raw req/s we would fix some of the code paths that waste a ton of cpu cycles. If we really cared about benchmark performance over anything else we would have dealt with the locking issues earlier so multi-threaded benchmarks would be better. (Even the most naive user benchmarks are usually multi-threaded.)

MongoDB is still a new product, there are definitely rough edges, and a seemingly infinite list of things to do.[1]

If you want to come talk to the MongoDB team, both our offices hold open office hours[2] where you can come and talk to the actual development teams. We try to be incredibly open, so please come and get to know us.

-Eliot

[0] http://www.10gen.com/presentations#speaker__eliot_horowitz [1] http://jira.mongodb.org/ [2] http://www.10gen.com/office-hours

[+] rit|14 years ago|reply

One addendum to Eliot's "both our offices hold open office hours"; we (10gen) also recently opened an office in London.

Although we don't yet have a fixed office hours schedule, we typically hold them every 2 weeks. The exact dates are announced via the local MongoDB Meetup Group°; we always hold the hours at "Look Mum No Hands" on Old Street.

At least one (and often several) of our Engineers make themselves available during this time to answer any questions and assist with MongoDB problems.

° http://www.meetup.com/London-MongoDB-User-Group

[+] lubujackson|14 years ago|reply

Great response. I'll take this over an anonymous, half-informed screed any day.

[+] wedgemartin|14 years ago|reply

We've been using Mongo for almost a year now, and we've not seen any of the major issues such as data loss referred to. We've seen some of the growing pains of a quickly moving, dynamic platform, but nothing outside of the realm of what is reasonable for such a powerful solution. It's true that implementing sharding is no simple task, but with enough planning up front, you'll find yourself able to scale horizontally very quickly. After a couple of weeks of planning, we wound up making a few small changes in our codebase to migrate from master/slave to a sharded environment. Not a huge undertaking by any stretch, provided the current flexibility of our platform. Also, due to the fact that 10gen does make all bug information publicly available, we've managed to get it done with zero surprises.

Wedge Martin CTO Badgeville

[+] tzury|14 years ago|reply

Eliot, thanks for coming online and publishing your perspective.

MongoDB simply gets better in any version and it is indeed a reliable platform, at least as human beings (employees) are.

[+] unknown|14 years ago|reply

[deleted]

[+] skrebbel|14 years ago|reply

> If you want to come talk to the MongoDB team, both our offices hold open office hours[2] where you can come and talk to the actual development teams. We try to be incredibly open, so please come and get to know us.

I envy how all your (potential) customers are from California.

[+] newman314|14 years ago|reply

Given the response, what are some best practices/gotchas for MongoDB then?

It might be helpful for 10gen to put together a short doc on what to watch out for evaluators.

[+] k_bx|14 years ago|reply

I was even gonna write a big blog post and say something similar to what you just said, but (of course) you said it better. Thank you.

[+] ttcbj|14 years ago|reply

[deleted]

[+] nmongo|14 years ago|reply

[deleted]

[+] chx|14 years ago|reply

This rant is completely outdated and it shows: "pre transaction log" "fixed this in 1.8". You realize MongoDB is at 2.0 now and the transaction log was introduced in 1.8, right? Yes, MongoDB had problems but since the transaction log it's pretty good. I have used MongoDB since early 1.3 and I knew what I was doing and we never lost a bit of data. There is a tradeoff -- while MongoDB handled write load easily that a MySQL box with 2-3 times the RAM , I/O capability couldn't at all we understood the bleeding edge of using MongoDB back then. We have, for example, kept a snapshot slave which shot itself down often, took an LVM snapshot then continued replicating. Never needed those.

We have meticulously kept a QA server pair around and the only time when I have ran into a data loss problem was when I have hosed one of those -- but only one and even the QA department could continue (and hosing that server was me not knowing that Redhat 5 had separate e4fsprogs and e2fsprogs, only partially MongoDB fault but now it works without O_DIRECT so even this would not be a problem any more) . Never understood for example how could foursquare get where they got to -- didnt they have a QA copy similarly?

[+] openmosix|14 years ago|reply

Well, I worked in Vodafone (and Nokia) in very large (laaarge) projects, serving ~50 milions users. Years ago, no hope for NoSQL, we used MySQL. We hit at least 10/20 bugs, solved by 'hotpatch' from Sun. So? I think as developers we should get used to bugs and patches. Should I write a post "don't use MySQL?". We also hit several bugs in the generational garbage collector. Stop using Java? I don't feel the drama here.

[+] woodhull|14 years ago|reply

I couldn't agree more with this analysis, with the added addition that the single threaded nature of the JS interpreter can also cause really bad & unexpected performance things to happen.

Most of the people who are excited about mongo, have never used it in a high volume environment, or with a large dataset. We used it for a medium sized app at my last employer, with paid support from 10gen, and everyone on the project walked away wishing we had stayed with a more mature data store.

Of course things work well when traffic is low, everything fits in memory, and there are no shards.

[+] davyjones|14 years ago|reply

I would love to see a thorough approach in which such claims are actually shown and can be reproduced. This helps everyone immensely...from 10gen to people looking to adopt.

[+] bbulkow|14 years ago|reply

Disclosure: I wrote a product called Citrusleaf, which also plays in the NoSQL space.

My focus in starting Citruseaf wasn't features, it was operational dependability. I had worked at companies who had to take their system offline when they had the greatest exposure - like getting massive load from the Yahoo front page (back in the day). Citrusleaf focuses on monitoring, integration with monitoring software, operations. We call ourselves a real-time database because we've focused on predictable performance (and very high performance).

We don't have as many features as mongo. You can't do a javascript/json long running batch job. We'll get around to features - right now we're focused on uptime and operational efficiency. Our customers are in digital advertising, where they have 50,000 transactions per second on terabyte datasets (see us at ad:tech in NYC this coming week).

Here's a performance analysis we did: http://bit.ly/rRlq9V

This theory that "mongo is designed to run on in-memory data sets" is, frankly, terrible --- simply because mongo doesn't give you the control to keep you in memory. You don't know when you're going to spill out of memory. There's no way to "timeout" a page cache IO. There's no asynchronous interface for page IO. For all of these reasons - and our internal testing showing page IO is 5x slower than aio; the reason all professional databases use aio and raw devices - we coded Citrusleaf using normal multithreaded io strategies.

With Citrusleaf, we do it differently, and that difference is huge. We keep our indexes in memory. Our indexes are the most efficient anywhere - more objects, fea. You configure Citrusleaf with the amount of memory you want to use, and apply policies when you start flowing out of memory. Like not taking writes. Like expiring the least-recently-used data.

That's an example of our focus on operations. If your application use pattern changes, you can't have your database go down, or go so slowly as to be nearly unusable.

Again, take my comments with a grain of salt, but with Citrusleaf you'll have great uptime, fewer servers, a far less complex installation. Sure, it's not free, but talk to us and we'll find a way to make it work for your project.

[+] donpark|14 years ago|reply

Burden of proof is on 10gen, not frustrated customers. This post is believable enough for me to avoid using MongoDB for write-heavy apps.

[+] davidw|14 years ago|reply

People seem to be jumping on a lot of the NoSQL stuff for no good reason. You can get a lot of mileage out of something like Postgres or Mysql, and they work pretty well for a lot of things. Ok, if you get huge, you might have to figure out something else, but that's a good problem to have. On the other hand, if you've lost all your data, you're not going to get huge.

I had to use MongoDB recently, and I wasn't very pleased with it. It wasn't really appropriate for the project, which had data that would have fit better in a relational DB.

[+] christkv|14 years ago|reply

A story from a newly created account by a person nobody can verify is real and asking other people to submit his rant (to gain what? credibility to his story?)

nomoremongo 4 hours ago | link I'd appreciate if someone would submit this story for me. http://pastebin.com/raw.php?i=FD3xe6Jt

What's up with the trolling here. Who are you and what company do you work for that has had all those problems you mentioned ?

[+] ericflo|14 years ago|reply

But it does 8,000,000 operations per second! http://www.snailinaturtleneck.com/blog/2010/05/05/with-a-nam...

(Sorry, possibly excessive snark. That said, I think that blog post is a good example of one of this pastebin author's points: at least historically, benchmark numbers have been a big focus for Mongo developers.)

[+] j_baker|14 years ago|reply

According to the link, that's 320k operations per server, which means that it handles 8 million operations per second with 25 servers.

I don't think it's a stretch to say that any database that has 25 servers should be able to handle at least 8 million operations a second.

[+] mtkd|14 years ago|reply

Anyone using Mongo currently has to be aware there are likely to be some teething issues as it is very new technology.

I haven't used it in production (yet), but I would have no fear of using it today. I would run regular consistency monitoring and validation around critical data just like I do with our SQL databases.

I'm willing to take my part of the pain and inconvenience in making technology like this stable.

You could have written this about any adolescent SQL server BITD. All the tools you use today had to go through this process.

For me Mongo is awesome and getting more awesome. Mongo and technology like it is the reason I still get excited about writing new apps.

[+] nmongo|14 years ago|reply

[deleted]

293 comments