A Year with MongoDB

[+] rdtsc|14 years ago|reply

> Safe off by default

I think that is fixed now.

But this the single most appalling design decision they could have made while also claiming their product was a "database". (And this has been discussed here before so just do a search if you will).

This wasn't a bug, it was a deliberate design decision. Ok, that that would have been alright if they put a bright red warning on their front page. "We disabled durability by default (even though we call this product a database). If you run this configuration as a single server your data could be silently corrupted. Proceed at your own risk". But, they didn't. I call that being "shady". Yap, you read the correctly, you'd issue a write and there would be no result coming back acknowledging that at least the data made into the OS buffer. I can only guess their motive was to look good in benchmarks that everyone like to run and post results on the web. But there were (still are) real cases of people's data being silently corrupted, and they noticed only much later, when backups, and backups of backups were already made.

[+] jbellis|14 years ago|reply

According to http://www.mongodb.org/display/DOCS/getLastError+Command it is still unsafe by default.

[+] mrkurt|14 years ago|reply

"Safe" is a driver implementation detail, not really a server default change. The driver basically has to give the DB a command, then ask what just happened for "safe" writes. If the driver doesn't bother listening for the result, the database just does whatever it's going to quietly.

That said, I really wish all the drivers issued the getLastError command after writes by default. It's the first thing we tell customers to set in their apps.

[+] siavosh|14 years ago|reply

I would bet that there are many apps out there that don't do any reconciliation, and the data will forever be lost unknown. Sadly only 1% of the customers will notice something weird, 0.001% of those will call support saying something is off, and then 99% of those calls will be ignored as customer incompetence. Scary indeed.

[+] charlieok|14 years ago|reply

I think they just assumed that people would run the database in clusters, not single instances.

If you've done enough research to choose a relatively off-the-beaten-path DBMS such as MongoDB, the assumption is that you've carefully weighed the tradeoffs made by the various alternatives in the space, and learned the best practices for deploying the one you chose to use.

[+] chx|14 years ago|reply

Quote the manual" the option is --journal, and is on by default in version 1.9.2+ on 64-bit platforms"

[+] Philadelphia|14 years ago|reply

The impression I got after hearing some of the 10gen developers speak at a conference is that MongoDB has the same essential problem as PHP. It was written by people without a lot of formal knowledge who, for whatever reason, aren't interested in researching what's been tried before, what works, and what doesn't. Because of that, they're always trying to reinvent the wheel, and make flawed design decisions that keep causing problems.

[+] sirn|14 years ago|reply

> We changed the structure of our heaviest used models a couple times in the past year, and instead of going back and updating millions of old documents, we simply added a “version” field to the document and the application handled the logic of reading both the old and new version. This flexibility was useful for both application developers and operations engineers.

Ugh, this sounds like a maintenance nightmare. How do you deal with adding extra field to the document? Do you ever feel the need of running on-the-fly migration of old versions? (But when you do, shouldn't running a migration for all documents a better idea?)

I'll admit I'm a non-believer, but every time I see "Schemaless" in MongoDB, I think "oh, so you're implementing schema in your application?"

[+] mitchellh|14 years ago|reply

> Ugh, this sounds like a maintenance nightmare. How do you deal with adding extra field to the document? Do you ever feel the need of running on-the-fly migration of old versions? (But when you do, shouldn't running a migration for all documents a better idea?)

Yes, we did on-the-fly migration as we loaded old data in.

Doing full data migration was not really an option because querying from MongoDB on un-indexed data is so slow, and paging in all that data would purge hot data, exacerbating the problem.

> I'll admit I'm a non-believer, but every time I see "Schemaless" in MongoDB, I think "oh, so you're implementing schema in your application?"

That's exactly what happens.

[+] bitops|14 years ago|reply

> I'll admit I'm a non-believer, but every time I see "Schemaless" in MongoDB, I think "oh, so you're implementing schema in your application?"

I think that is arguably one of the selling points of MongoDB. Yes, you do implement schema in your application, but should be doing that knowingly and embracing both the costs and benefits.

The benefit is that you can very quickly change your "schema" since it's just however you're choosing to represent data in memory (through your objects or whatever). It also allows you to have part of your application running on one "version" of the schema while another part of the application catches up.

The tradeoff is that you have to manage all this yourself. MongoDB does not know about your schema, nor does it want to, nor should it. It affords you a lot of power, but then you have to use it responsibly and understand what safety nets are not present (which you may be used to from a traditional RDBMS).

To your point about migrations, there are a few different strategies. You can do a "big bang" migration where you change all documents at once. (I would argue that if you require all documents to be consistent at all times, you should not be using MongoDB). A more "Mongo" approach is to migrate data as you need to; e.g. you pull in an older document and at that time add any missing fields.

So yes, in MongoDB, the schema effectively lives in your application. But that is by design and to use MongoDB effectively that's something you have to embrace.

[+] radicalbyte|14 years ago|reply

> oh, so you're implementing schema in your application?

Isn't that where the schema belongs? Each document represents a conceptual whole. It doesn't contain fields which have to be NULL simply because they weren't in previous versions of the schema.

I've been an rdbms guy (datawarehousing/ETL) for a long time now, I've seen a lot of large databases which have been in production for considerable time. They get messy. Really messy. They become basically unmaintainable. Apples, oranges and pears all squashed into a schema the shape of a banana.

It's a pretty elegant solution, and is the problem XML/XSD were designed to solve.

The cleanest solution that I've seen in production used a relational database as a blob storage for XML-serialized entities. Each table defined a basic interface for the models, but each model was free to use its own general schema. After 10 years it contained a set of very clean individual entities which were conceptually correct.

As opposed to the usage as a serialization format for remoting, which has been largely replaced with JSON.

[+] fizx|14 years ago|reply

There are many serialization formats (e.g. Apache Thrift) with versionable schemas. You can do a poor man's Thrift with json, mongo, etc.

It's a common thing to throw thrift or protobuf values in giant key-value stores (cassandra, replicated memcache, hbase). You don't need to migrate unless you change the key format. If you want do on the fly migrations (mostly to save space, and have the ability to delete ancient code paths), you can do them a row at a time. And yes, we do occasionally write a script to loop through the database and migrate all the documents.

[+] FuzzyDunlop|14 years ago|reply

> I'll admit I'm a non-believer, but every time I see "Schemaless" in MongoDB, I think "oh, so you're implementing schema in your application?"

I saw what may have well been 'schemaless' in an RBDMS recently, and the application code for it was far from pretty. I couldn't migrate at all; the results were far too inconsistent to pull it off reliably (you know something is wrong when boolean values are replaced with 'Y' and 'N', which of course both evaluate to 'true').

That being said, I tried to implement something else with Node.js and MongoDB, and I found it quite manageable. As long as the application implements a schema well, you should still be able to infer it when looking at the database direct.

To that extent, I'd take that over using an RBDMS as a key/value store for serialised data, because that's typically useless without the application that parses it.

[+] luca_garulli|14 years ago|reply

Hey, by reading all the bad things seems that OrientDB would fit better than MongoDB for them:

- Non-counting B-Trees: OrientDB uses MVRB-Tree that has the counter. size() requires 0ns

- Poor Memory Management: OrientDB uses MMAP too but with many settings to optimize it usage

- Uncompressed field names: the same as OrientDB

- Global write lock: this kills your concurrency! OrientDB handles read/write locks at segment level so it's really multi-thread under the hood

- Safe off by default: the same as OrientDB (turn on synch to stay safe or use good HW/multiple servers)

- Offline table compaction: OrientDB compacts at each update/delete so the underlying segments are always well defragmented

- Secondaries do not keep hot data in RAM: totally different because OrientDB is multi-master

Furthermore you have Transactions, SQL and support for Graphs. Maybe they could avoid to use a RDBMS for some tasks using OrientDB for all.

My 0,02.

[+] mtrn|14 years ago|reply

Thanks for writing OrientDB! - I tried it, but I was pressed for time, so I needed something that more or less worked instantly for my requirements - which in the end was elasticsearch.

TL;

I researched MongoDB and OrientDB for a side-project with a bit heavy data structure (10M+ docs, 800+ fields on two to three levels). MongoDB was blazingly fast, but it segfaulted somewhere in the process (also index creation needs extra time and isn't really ad-hoc). OrientDB wasn't as fast and a little harder to do the initial setup but the inserting speed was ok - for a while (500k docs or so) and then it degraded. I also looked at CouchDB, but I somehow missed the ad-hoc query infrastructure.

My current solution, which works nice for the moment is elasticsearch; it's fast - and it's possible to get a prototype from 0 to 10M docs in about 50 minutes - or less, if you load balance the bulk inserts on a cluster - which is so easy to setup it's scary - and then let a full copy of the data settle on each machine in the background.

Disclaimer - since this is a side project, I did only minimal research on each of the technologies (call it 5 minute test) and ES clearly won the first round over both MongoDB and OrientDB.

[+] amalag|14 years ago|reply

I have always been curious about OrientDB, but from what I saw it was very small and not backed by any commercial entity and it's usage was not widespread. Also Luca, you should in fairness write that you are the maintainer.

[+] unknown|14 years ago|reply

[deleted]

[+] tolitius|14 years ago|reply

great post. direct and to the point, although there are many more flaws that I am sure you could have shared.

we tried MongoDB to consume and analyze market feeds, and it failed miserably. I can add a couple of things to your list:

* if there is a pending write due to an fsync lock, all reads are blocked: https://jira.mongodb.org/browse/SERVER-4243

* data loss + 10gen's white lies: https://jira.mongodb.org/browse/SERVER-3367?focusedCommentId...

* _re_ sharding is hard. shard key is should be chosen once and for all => that alone kills the schemaless advantage

* moving chunks between shards [manually or auto] can take hours / days depending on the dataset (but we talking big data, right?)

* aggregate (if any complex: e.g. not SUM, COUNT, MIN, MAX) over several gigs of data takes hours (many minutes at best). Not everything can be incremental..

Those are just several. MongoDB has an excellent marketing => Meghan Gill is great at what she does. But besides that, the tech is not quite there (yet?).

Nice going with Riak + PostgreSQL. I would also give Redis a try for things that you keep in memory => set theory ftw! :)

[+] armon|14 years ago|reply

I work at Kiip, and I can confirm that our "non-durable purely in-memory solution" is Redis.

[+] taligent|14 years ago|reply

MongoDB is successful because of more than just marketing.

It has great tool support, decent documentation, books and is accessible. Plus the whole transition from MySQL concept makes it easy to grab onto.

[+] lalmalang|14 years ago|reply

We had a very similar situation ~300 writes per second on AWS. but I suspect some of this has to do with the fact that most people address scaling by adding a replica set, rather than the much hairier sharding setup (http://www.mongodb.org/display/DOCS/Sharding+Introduction), this seems natural b/c mongodb's 'scalability' is often touted. In reality though, because of the lock, RS dont really address the problem much, and we encountered many of the problems described by the OP.

Not to denigrate the work the 10gen guys are doing -- they are obviously working on a hard problem, and were very helpful, and the mms dashboard was nice to pinpoint issues.

We decided to switch too though in the end, though i still enjoy using mongo for small stuff here and there

[+] jasonmccay|14 years ago|reply

Again ... this is more an indictment on the poor IO performance of Amazon EBS vs. MongoDB as a solution. MongoDB can scale both vertically and horizontally, but as with anything you scale on Amazon infrastructure, you are going to have to really think through strategies for dealing with the unpredictable performance of EBS. There are blog posts galore addressing this fact.

I often think MongoDB has suffered more as a young technology because of the proliferation of the AWS Cloud and the expectations of EBS performance.

[+] disbelief|14 years ago|reply

Mind sharing what you switched to? Another schemaless data store, or a more traditional RDBMS?

[+] jrussbowman|14 years ago|reply

From the beginning I've understood mongodb to be built with it's approach for scaling, performance, redundancy and backup to be horizontal scaling. They recently added journaling for single server durability, but before that replication was how you made sure you data was safe.

It seems to me when I see complaints about mongodb it's because people don't want to horizontally scale it and instead believe vertical scaling should be more available.

Just seems to me people don't like how mongodb is built, but if used as intended I think mongodb performs as advertised. In most cases I don't think it's the tool, rather the one using it.

[+] mitchellh|14 years ago|reply

[Note: I wrote the blog post]

I'm not at all against horizontally scaling. However, I don't believe that horizontally scaling should be necessary doing a mere 200 updates to per second to a data store that isn't even fsyncing writes to disk.

Think of it in terms of cost per ops. Let's just say 200 update ops per second is the point at which you need to shard (not scientific, but let's just use that as a benchmark since that is what we saw at Kiip). MongoDB likes memory, so let's use high-memory AWS instances as a cost benchmark. I think this is fair since MongoDB advertises itself as a DB built for the cloud. The cheapest high-memory instance is around $330/month.

That gives you a cost per op of 6.37e-5 cents per update operation.

Let's compare this to PostgreSQL, which we've had in production for a couple months at Kiip now. Our PostgreSQL master server has peaked at around 1000 updates per second without issue, and also with the bonus that it doesn't block reads for no reason. The cost per op for PostgreSQL is 1.27e-5 cents.

Therefore, if you're swimming in money, then MongoDB seems like a great way to scale. However, we try to be more efficient with our infrastructure expenditures.

EDIT: Updated numbers, math is hard.

[+] LeafStorm|14 years ago|reply

While I agree that you should definitely use your tools in the best way you're capable of, I think for most people there's a baseline expectation that if you save data in a database, that data will be safe (at the very least, recoverable) unless something happens like the server catching on fire.

Nearly every other major database has this as the default -- MySQL, PostgreSQL, CouchDB, and Berkeley DB to name a few. (Redis doesn't, but it's also very upfront about it, and does provide this kind of durability as an option from early on.) So when MongoDB breaks this expectation, and when asked to support it as an option, just says, "That's what more hardware is for," it's a pretty big turnoff.

[+] saurik|14 years ago|reply

The behavior described in the article, though, is just a fundamental misuse of the hardware resources. If I'm hitting write bottlenecks, I need to shard: that's just a fact of reality. However, write throughput and read throughput should be unrelated (as durable writes are bottlenecks by disk and reads of hot data are bottlenecked by CPU/RAM): if I have saturated five machines with writes I should have five machines worth of read throughput, and that is a /lot/ of throughput... with the behavior in this article, you have no read capacity as you are blocked by writes, even if the writes are to unrelated collections of data (which is downright non-sensical behavior). Even with more machines (I guess, twice as many), the latency of your reads is now being effected horribly by having to wait for the write locks... it is just a bad solution.

[+] cheald|14 years ago|reply

I think the article is a fair criticism, and I think your response is likewise fair.

Mongo was built with horizontal scaling in mind, and to that end, it tends to suffer noticeably when you overload a single node. Things like the global write lock and single-threaded map/reduce are definitely problems, and shouldn't be pooh-pooh'd away as "oh, just scale horizontally". Uncompacted key names are a real problem, and a consequence of it is that Mongo tends to take more disk/RAM for the same data (+indexes) than a comparable RDBMS does. Maintenance is clunky - you end up having to repair or compact each slave individually, then step down your master and repair or compact it, then step it back up (which does work decently, but is time consuming!)

None of these are showstoppers, but they are pain points, especially if you're riding the "Mongo is magical scaling sauce" hype train. It takes a lot of work to really get it humming, and once you do, it's pretty damn good at the things it's good at, but there are gotchas and they will bite you if you aren't looking out for them.

[+] mrkurt|14 years ago|reply

Part of the lesson here is that if you're doing MongoDB on EC2, you should have more than enough RAM for your working set. EBS is pretty bad underlying IO for databases, so you should treat your drives more as a relatively cold storage engine.

This is the primary reason we're moving the bulk of our database ops to real hardware with real arrays (and Fusion IO cards for the cool kids). We have a direct connect to Amazon and actual IO performance... it's great.

[+] mitchellh|14 years ago|reply

> Part of the lesson here is that if you're doing MongoDB on EC2, you should have more than enough RAM for your working set.

We had more than enough RAM for our working set. Unfortunately, due to MongoDB's poor memory managed and non-counting B-trees, even our hot data would sometimes be purged out of memory for cold, unused data, causing serious performance degradation.

[+] latch|14 years ago|reply

I'm not sure why people are using EBS with their databases. If you already have replication properly set up, what does it buy you except for performance problems?

Chris Westin, of 10gen, blogged about this a while ago: https://www.bookofbrilliantthings.com/blog/what-is-amazon-eb...

In fairness though, 10gen's official stance is to use EBS. I think that's a mistake, and I think maybe they do it for extra safety.

[+] axisK|14 years ago|reply

Nice article, the write lock has really been making me think about whether it's really the way to go in our own stack.

[+] nomoremongo|14 years ago|reply

Yyyyyyyup.

http://pastebin.com/raw.php?i=FD3xe6Jt

[+] aschobel|14 years ago|reply

We love MongoDB at Catch, it's been our primary backing store for all user data for over 20 months now.

  > Catch.com
  > Data Size: 50GB
  > Total Documents 27,000,000
  > Operations per second: 450 (Create, reads, updates, etc.)
  > Lock % average 0%
  > CPU load average 0%

Global Lock isn't ideal, but Mongo is so fast it hasn't been an issue for us. You need to keep on slow queries and design your schema and indexes correctly.

We don't want page faults on indexes, we design them to keep them in memory.

I don't get the safety issue, 20 months and we haven't lost any user data. shrug

[+] bretthoerner|14 years ago|reply

> I don't get the safety issue, 20 months and we haven't lost any user data. shrug

Nobody loses any user data until they do.

[+] strlen|14 years ago|reply

> Data Size: 50GB

I am sorry, to sound blunt, but that's an irrelevant data point. With a data set that fits comfortably into RAM (much less SSDs in RAID!), most any data store will work (including MySQL or Postgres).

> Operations per second: 450

Again, not a relevant data point. With a 10 ms seek time on a SATA disk, this is (again) well within the IOPS capacity of a single commodity machine (with RAID, a SAS drive, row cache, and operating system's elevator scheduling).

[+] guywithabike|14 years ago|reply

450 ops/sec is nothing.

What's your breakdown between the operation types, and what kind of hardware are you on?

[+] gatesvp1|14 years ago|reply

Not a personal attack as Catch is a neat product, but these numbers are basically irrelevant.

This type of load can easily be handled by a simple SQL box. We did these types of #s with a single SQL Server box 4 years ago, except that your "total documents" was our daily write load.

[+] jalons|14 years ago|reply

Am I missing something, or did they say they didn't want to scale mongo horizontally via sharding, then comment that they're doing so with riak, but faulting mongodb for requiring it?

[+] gregbair|14 years ago|reply

If you're going to have a (management|engineering|whatever) blog for your company/project, have a link to your company/project home page prominently somewhere on the blog.

[+] pearkes|14 years ago|reply

Good point. For the record, we're at http://kiip.me

[+] unknown|14 years ago|reply

[deleted]

[+] DonnyV|14 years ago|reply

Please upvote collection level locking for MongoDB here. https://jira.mongodb.org/browse/SERVER-1240

[+] unknown|14 years ago|reply

[deleted]

[+] citricsquid|14 years ago|reply

Not related to the article but the site: has anyone else been getting "connection interrupted" errors with tumblr recently? If I load a tumblr blog for the first time in ~24 hours the first and second page loads will result in connection interrupted, the 3rd and beyond will all load fine.

[+] pixelmonkey|14 years ago|reply

This is a pretty epic troll on MongoDB, and some of their points are important -- particularly global write lock and uncompressed field names, both issues that needlessly afflict large MongoDB clusters and will likely be fixed eventually.

However, it's pretty clear from this post that they were not using MongoDB in the best way. For example, in a small part of their criticism of "safe off by default", they write:

"We lost a sizable amount of data at Kiip for some time before realizing what was happening and using safe saves where they made sense (user accounts, billing, etc.)."

You shouldn't be storing user accounts and billing information in MongoDB. Perhaps MongoDB's marketing made you believe you should store everything in MongoDB, but you should know better.

In addition to that data being highly relational, it also requires the transactional semantics present in mature relational databases. When I read "user accounts, billing" here, I cringed.

Things that it makes total sense to use MongoDB for:

- analytics systems: where server write thorughput, client-side async (unsafe) upserts/inserts, and the atomic $inc operator become very valuable tools.

http://blog.mongodb.org/post/171353301/using-mongodb-for-rea...

- content management systems: where schema-free design, avoidance of joins, its query language, and support for arbitrary metadata become an excellent set of tradeoffs vs. tabular storage in an RDBMS.

http://www.mongodb.org/display/DOCS/How+MongoDB+is+Used+in+M...

- document management systems: I have used MongoDB with great sucess as the canonical store of documents which are then indexed in a full-text search engine like Solr. You can do this kind of storage in an RDBMS, but MongoDB has less administrative overhead, a simpler development workflow, and less impedance mismatch with document-based stores like Solr. Further, with GridFS, you can even use MongoDB as a store for actual files, and leverage MongoDB's replica sets for spreading those files across machines.

Is your data relational? Can you benefit from transactional semantics? Can you benefit from on-the-fly data aggregation (SQL aggregates)? Then use a relational database!

Using multiple data stores is a reality of all large-scale technology companies. Pick the right tool for the right job. At my company, we use MongoDB, Postgres, Redis, and Solr -- and we use them each on the part of our stack where we leverage their strengths and avoid their weaknesses.

This article reads to me like someone who decided to store all of their canonical data for an e-commerce site in Solr, and then complains when they realized that re-indexing their documents takes a long time, index corruption occurs upon Solr/Lucene upgrades, or that referential integrity is not supported. Solr gives you excellent full-text search, and makes a lot of architectural trade-offs to achieve this. Such is the reality of technology tools. What, were you expecting Solr to make your coffee, too?

Likewise, MongoDB made a lot of architectural tradeoffs to achieve the goals it set out in its vision, as described here:

http://www.mongodb.org/display/DOCS/Philosophy

It may be a cool technology, but no, it won't make your coffee, too.

In the end, the author writes, "Over the past 6 months, we've scaled MongoDB by moving data off of it. [...] we looked at our data access patterns and chose the right tool for the job. For key-value data, we switched to Riak, which provides predictable read/write latencies and is completely horizontally scalable. For smaller sets of relational data where we wanted a rich query layer, we moved to PostgreSQL."

Excellent! They ended up in the right place.

[+] jeffdavis|14 years ago|reply

So, they made some mistakes, learned from them, and ended up in the right place.

Sounds like a great story for a blog post, that others might learn from as well.

Calling it a troll -- just because their mistakes involved mongo and their solution did not -- seems harsh.

[+] paulsutter|14 years ago|reply

This article is an excellent articulation of the strengths and (fixable) issues with mongoDB.

I like MongoDB a lot, and the improvements suggested would really strengthen the product and could make me more comfortable to use it in more serious applications.

[+] madworld|14 years ago|reply

How is the global write lock "fixable" without a major rewrite of the codebase?

Like the article suggested, it would be one thing if they did it for transaction support. In reality, from looking at the code, it seems like the global write lock came from not wanting to solve the hard problems other people are solving.

[+] radagaisus|14 years ago|reply

Start Up Bloggers listen up! If I click on your blog's logo I want to see your product, not the blog's main page.

141 comments