Why You Should Never Use MongoDB

[+] gmjoe|12 years ago|reply

> Seven-table joins. Ugh.

What? That's what relationship databases are for. And seven is nothing. Properly indexed, that's probably super-super-fast.

This is the equivalent of a C programmer saying "dereferencing a pointer, ugh". Or a PHP programmer saying "associative arrays, ugh".

I think this attitude comes from a similar place as JavaScript-hate. A lot of people have to write JavaScript, but aren't good at JavaScript, so they don't take time to learn the language, and then when it doesn't do what they expect or fit their preconceived notions, they blame it for being a crappy language, when it's really just their own lack of investment.

Likewise, I'm amazed at people who hate relational databases or joins, because they never bothered to learn SQL and how indexes work and how joins work, discover that their badly-written query is slow and CPU-hogging, and then blame relational databases, when it's really just their own lack of experience.

Joins are good, people. They're the whole point of relational databases. But they're like pointers -- very powerful, but you need to use them properly.

(Their only negative is that they don't scale beyond a single database server, but given database server capabilities these days, you'll be very lucky to ever run into this limitation, for most products.)

[+] bowlofpetunias|12 years ago|reply

People hate joins because at some point they get in the way of scaling, and getting past that is a huge pain.

Or at least, that's where the original join-hate comes from.

In reality of course, most of us don't have that problem, never had and never will, and it's just being parroted as an excuse for not bothering to understand RDMS's.

Relational database design is a highly undervalued skill outside the enterprise IT world. Many of the best programmers I've worked with couldn't design a proper database if their lives depended on it.

[+] schrodinger|12 years ago|reply

7 isn't necessarily nothing. Each join is O(log(n)), so I believe you're stuck with O(log(n)^7) as a worst case, although in practice it will probably not be so bad since one of the joins will probably limit the result set significantly.

The other problem is that with 7 joins, that's 7! permutations of possible orders in which the database can perform the join. That's a lot of combinations, and often you can run into the optimizer picking a poor plan. Sometimes it picks a good plan initially, and then as your data set changes it can choose a different, suboptimal plan. This leads to unpredictable performance.

I think that in practice, you're best off sticking with only a few joins...

[+] _pferreir_|12 years ago|reply

> A lot of people have to write JavaScript, but aren't good at JavaScript, [...] they blame it for being a crappy language, when it's really just their own lack of investment.

I think it's pretty much an accepted fact that JS has its problems. Even Brendan Eich has been quoted as admitting it.

(Note: I am a JS developer myself)

[+] achy|12 years ago|reply

I agree with you to a point. Joins are your friend. But trying to pull out all of the information about a graph of 'objects' using a single query with multiple one-to-many and many-to-many joins is just as foolish in SQL as in Mongo.

[+] chao-|12 years ago|reply

Do you have any resources you would recommend to understand or at least give an overview of indexes?

I learned basic SQL once-upon-a-time and understand the relational algebra side of things, but only truly picked up the finer details and specific engines in piecemeal manner, as needed in various projects.

[+] Keyframe|12 years ago|reply

Star, constellation, snowflake, flat.. Developers (not the author) would benefit from database introductory course even if they are not using databases. I think Stanford did one that was open to everyone.

[+] mkoryak|12 years ago|reply

This article ends up agreeing with you at the end, by the way.

[+] pekk|12 years ago|reply

When program errors pass silently, that is a legitimate problem in the toolchain.

[+] dasil003|12 years ago|reply

There is a good reason that relational databases have long been the default data store for new apps: they are fundamentally a hedge on how you query your data. A fully-normalized database is a space-efficient representation of some data model which can be queried reasonably efficiently from any angle with a bit of careful indexing.

Of course relational databases being a hedge, are not optimal for anything. For any particular data access pattern there's probably an easy way to make it faster using x or y nosql data store. However as the article points out, before you decide to go that route you better be pretty certain that you know exactly how you are going to use your data today and for all time. You also should probably have some pretty serious known scalability requirements otherwise it could be premature optimization.

Neither of these things are true for a startup, so I'd say startups should definitely stay away from Mongo unless they really know what they are doing. Being ignorant of SQL and attracted by the "flexibility" of schema-less data stores powered by javascript is definitely the wrong reason to look at Mongo.

[+] just2n|12 years ago|reply

I once tried to insert a screw using a hammer. I'll be writing my article "Why You Should Never Use A Hammer" shortly.

And here's the crux of the problem with this article, and of so many articles like it:

"When you’re picking a data store"

"a", as in singular. There is no rule in building software that says you have to use 1 tool to do everything.

[+] coldtea|12 years ago|reply

>I once tried to insert a screw using a hammer. I'll be writing my article "Why You Should Never Use A Hammer" shortly.

More like: "I once tried to insert a screw with a fish-shaped, peanut butter and jelly covered, see-through tv set".

[+] sbov|12 years ago|reply

I always find comparisons to tools disingenuous because people take simple tools (a hammer) and compare them to complex software tools that if you misunderstand can ruin your company.

Your database isn't a hammer. It's closer to 19th century industrial machine with hundreds of buttons and levers that will cut your hand off if you use it incorrectly.

[+] dbcfd|12 years ago|reply

I think this is the first post on HN I wish I had a downvote button for, just for the reason you list. There is a reason there are different flavors of databases, and MongoDB most definitely would not be my choice for representing graph like relationships.

It's also scary that it has 217 points because it bashes Mongo.

[+] joedevon|12 years ago|reply

Well said sir. I only skimmed the article, but afaict the author still has not discovered graph stores, an appropriate way to store social graphs.

I remember downloading Disapora back in the day. The idea behind it was great. But the code looked quite awful and insecure.

[+] ajmurmann|12 years ago|reply

It's more like using a very good screw driver instead of a swiss army knife that does an OK job at everything.

Yes there is no rule that one tool has to work for everything, but there is a rule in Agile that you should push off making assumptions about the future as far as possible, because you will never know less than right now

[+] lucasnemeth|12 years ago|reply

I actually liked the article, thought it was interesting. But the title is a complete clickbait. It does not even says that "you should never use mongodb", it points some situations where MongoDB is a good match. I know a title "Think well if mongodb applies to your case" is not attractive, but it is less sensationalist.

[+] prof_hobart|12 years ago|reply

I know very little about MongoDB, or NoSQL in general, but I'm very interested in it. Are there any good sites/articles I should start looking at to see where it would be the right tool?

[+] rlpb|12 years ago|reply

The difference is that many people are trying to insert a screw with this particular hammer today.

[+] m_mueller|12 years ago|reply

I don't know much about MongoDB, but I've been using a lot of CouchDB for my current project. Am I correctly assuming that MongoDB has no equivalent for CouchDB views? Because if it had, all these scenarios shouldn't be a problem.

Here's how relational lookups are efficiently solved in CouchDB:

- You create a view that maps all relational keys contained in a document, keyed by the document's id.

- Say you have a bunch of documents to look up, since you need to display a list of them. You first query the relational view with all the ids at once and you get back a list of relational keys. Then you query the '_all' view with those relational keys at once and you get a collection of all related documents - all pretty quickly, since you never need to scan over anything (couchDB basically enforces this by having almost no features that will require a scan).

- If you have multiple levels of relations (analogous to multiple joins in RDBMs), just extract they keys from above document collection and repeat the first two steps, updating the final collection. You therefore need two view lookups per relational level.

All this can be done in RDBMs with less code, but what I like about Couch is how it forces me to create the correct indexes and therefore be fast.

However, if my assumption about MongoDB is correct, I have to ask why so many people seem to be using it - it would obviously only be suitable in edge cases.

[+] rubiquity|12 years ago|reply

Spot on about CouchDB. I haven't used MongoDB for anything of decent scale but I must say I was shocked to read in the OP that they store huge documents like from the Movie example in MongoDB. In CouchDB you can use Views to sort of recursively find all of the other documents that your current doucment has a document ID for. This takes advantage of CouchDB's excellent indexing. I'm not trying to start a CouchDB vs MongoDB war here but again, I just say I'm surprised at the types of documents OP was storing in MongoDB.

[+] rdtsc|12 years ago|reply

Also CouchDB has better safety. Its append-only files allows you to make hot backups and safely pull the plug on your server if need be without worrying corrupting data.

Plus change feeds and peer to peer replication are first class citizens in the CouchDB. Once you start having large number of clients needing realtime updates, having to periodically poll for data updates can get very expensive.

[+] jimbokun|12 years ago|reply

It looks like the CouchDB vs. MongoDB in the document store world is the equivalent of the Postgresql vs. MySQL debate in the relation world.

[+] dbcfd|12 years ago|reply

db.find({"field":"value"},{"field":1,"someotherfield":2})

Finds all documents with field having value, returning only field and someotherfield. That part is similar to the map portion of a CouchDB/Couchbase view. No reduce portion though.

If field is what the index is built off of, it should be similar performance wise to a view. Just like views have to be created beforehand, so do mongo indices.

The difference is the find of a mongo document will happen much more quickly after insertion than the find of a couch value by view. Views require rebuild in couch which is not instantaneous.

[+] scbrg|12 years ago|reply

I must have read a dozen (conservative estimate) articles now all called "Why you should never use MongoDB ever" - or permutation thereof. Each and every one of them ought to have been called "I knew fuckall about MongoDB and started writing my software as if it was a full ACID compliant RDBMS and it bit me."

There are essentially two points that always come up:

1. Oh my God it's not relational!

Well, you could argue that if you move from a type of software that is specifically called RELATIONAL Database Management System to one that isn't, one of the things you may lose is relation handling. Do your homework and deal with it.

2. Oh my God it doesn't have transactions!

This is, arguably, slightly less obvious, and in combination with #1 can cause issues. There are practices to work around it, but it is hardly to be considered a surprise.

I keep stumbling on these stories - but still these are the two major issues that are raised. I'm starting to get a bit puzzled by the fact that these things are still considered surprises.

In either case, I'm happily using MongoDB. It has its fair share of quirks and limitations, but it also has its advantages. Learn about the advantages and disadvantages, and try to avoid locking too large parts of your code to the storage backend and you'll be fine.

FWIW, I think the real benefit of MongoDB is flexibility w/r to schema and datamodel changes. It fits very, very well with a development process which is based on refactoring and minor redesigns when new requirements are defined. I much prefer that over the "guess everything three years in advance" model, and MongoDB has served us well in that respect.

[+] Justsignedup|12 years ago|reply

SQL is actaully a rediculously elegant language at expressing data, and relationships. NOT a good general purpose language. So I tend to always favor sql for relational data, actually most data.

queues, caches, etc = nosql solution. They tend to have much more features around performance to handle the needs of these problems, but not much in terms of relational data.

If you study relational databases and what they do, you will quickly find the insane amount of work done by the optimizer and the data joiner. That work is not trivial to replicate even on a specific problem, and ridiculously hard to generalize.

And so this article's assertion that mongodb is an excellent caching engine, but a poor data store is very accurate in my eyes.

[+] _sh|12 years ago|reply

No. SQL is actually pretty third-rate at expressing data and relationships. My preferred way of expressing data and relationships is the programming language I am writing in.

The problem with SQL is that it is not an API, it's a DSL. Which usually means source-code-in-source-code, string concatenation/injection attacks, and crappy type translations ('I want to store a double, what column type should I use? FLOAT? NUMERIC(16,8)?'). Even as a DSL it's pretty low-brow: just look at how vastly different the syntax is between insert and update, or 'IS NULL'.

For all those who love SQL, consider having to address your filesystem with it. Directories are tables, foreign-keyed to their parent, files are rows. There's a good reason why this nightmare isn't real: APIs are preferred over DSLs for this use case. And so too for databases, because they are the same abstraction.

Don't get me wrong, I love relational algebra and the Codd model, but SQL just aint it. SQL has survived because of its one and only strength: cross-platform. And like all cross-platform technologies, such as Java bytecode and Javascript, its rightful place is a compilation target for saner, richer, more expressive technologies. This is why I always use an ORM and have vowed to never, ever, write a single line of SQL again.

[+] threeseed|12 years ago|reply

What rubbish. NoSQL is only good for queues and caches ? Who on earth uses a database for this ?

NoSQL works well when you are modelling your data in ways that fit their particular use cases. Cassandra is great with CQRS, Riak is for key/value, MongoDB document.

[+] JangoSteve|12 years ago|reply

As others have pointed out, this article can basically be summarized as, "don't use MongoDB for data that is largely relational in nature."

Mongo (or most document stores) are good for data that is naturally nested and silo'd. An example would be an app where each user account is storing and accessing only their own data. E.g. something like a todo list, or a note-taking app, would be examples where Mongo may be beneficial to use.

A distributed social network, I would have assumed, would be the antithesis of the intended use-case for a document store. I would have to imagine a distributed social network's data would be almost entirely relational. This is what relational databases are for.

[+] chaz|12 years ago|reply

Linkbait title aside, it's actually a helpful example for directing a database novice on when to not use a document store. I could have used this post a few times in the past few years.

[+] raverbashing|12 years ago|reply

Really, in some places it hurts

* We stored each show as a document in MongoDB containing all of its nested information, including cast member*

I've seen this in people using MongoDB and the bough the BS that because "it's a document store" there should be no link between documents.

People leave their brain at the door, swallow "best practices" without questioning and when it bites them then suddenly it's the fault of technology.

" or using references and doing joins in your application code (double ugh), when you have links between documents"

1) MongoDB offers MapReduce so you can join things inside the DB. 2) What's the problem to have links between documents? Really? Looks like another case of "best practice BS" to me

[+] lhc-|12 years ago|reply

Links in mongo aren't really links though; its up to the application to handle the "joins", which really means making an extra query for every linked item. It's like SQL joins except without any of the supporting tools or optimizations that exist in RBDMS.

[+] liveoneggs|12 years ago|reply

at mongo training we were told map/reduce did not offer performance and to avoid it for online use. You must use the "aggregation framework".

[+] exclusiv|12 years ago|reply

Do NOT use MongoDB unless you understand it and how your data will be queried. Joins like the author mentions by ID is not a bad thing. If you aren't sure how you are going to query your data, then go with SQL.

With a schemaless store like Mongo, I've found you actually have to think a LOT more about how you will be retrieving your information before you write any code.

SQL can save your ass because it is so flexible. You can have a shitty schema and make it work in the short term until you fix the problem.

I wrote many interactive social apps (fantasy game apps) on Facebook and it worked incredibly well and this was before MongoDB added a lot of things like the aggregation framework.

The speed of development with MongoDB is remarkable. The replica sets are awesome and admin is cake.

It sounds like the author chose it without thinking about their data and querying upfront. I can understand the frustration but it wasn't MongoDB's fault.

This is a big deal for MongoDB: https://jira.mongodb.org/browse/SERVER-142.

Let's say you have comments embedded on a document and you want to query a collection for matches based on a filter. If you do that, you'll get all of the embedded comments back for each match and then have to filter on the client. IMO, when the feature above is added, MongoDB will become more usable for more use cases that web developers see.

[+] kcorbitt|12 years ago|reply

I've seen a fair number of articles over the last couple of years comparing the strengths and weaknesses relational/document-store/graph databases. What I've never seen adequately addressed is why that tradeoff even has to exist. Is there some fundamental axiom like the CAP theorem explaining why a database like MongoDB couldn't implement foreign keys and indexing, or why an SQL couldn't implement document storage to go along with its relational goodness?

In fact, as far as I can tell (never having used it), Postgres's Hstore appears to offer the same advantages as a document store, without sacrificing the ability to add relations when necessary. Where's the downside?

[+] ddebernardy|12 years ago|reply

> why an SQL couldn't implement document storage to go along with its relational goodness? (…) Postgres's Hstore appears to offer the same advantages as a document store, without sacrificing the ability to add relations when necessary. Where's the downside?

PostgreSQL can store arbitrary unstructured documents just fine: hstore, json, … Each come with the possibility to actually index arbitrary fields within the documents using a BTREE index on an expression, and arbitrary documents wholesale using GIST index.

Besides the need to know a thing or two on query optimization, the only downside I can think of is that ORMs are usually broken (Ruby's Sequel is a notable exception). But this isn't a problem with Postgres itself; it's a problem with ORMs (and training, admittedly).

[+] mhluongo|12 years ago|reply

Typically as your data model complexity ("relatedness") increases, it's more difficult to scale. I'm not sure about anything like CAP, but I do know that in graph-database land we have to remind ourselves that general graph partitioning is NP-Hard, and that our solutions will need to be domain-specific.

[+] boomzilla|12 years ago|reply

But it's not web scaled :)

http://www.youtube.com/watch?v=URJeuxI7kHo

[+] hkarthik|12 years ago|reply

>> Some folks say graph databases are more natural, but I’m not going to cover those here, since graph databases are too niche to be put into production.

Is this really true? It sounds like both relational DBs and document DBs are a poor choice for the social network problem posed. I've actually dealt with this exact problem at my last job when we started on Mongo, went to Postgres, and ultimately realized we traded one set of problems for another.

I'd love to see a response blog post from a Graph DB expert that can break down the problem so that laymen like myself can understand the implementation.

[+] edude03|12 years ago|reply

I hate link bait like this.

The real title should be "Why you should never use a tool without investigating it's intended use case"

[+] Robin_Message|12 years ago|reply

But the point is that there is no use case. Relational databases and normalisation didn't arise because a load of neckbeards wanted bad performance and extra complexity.

The point of the article is that the world is relational, and because Mongo isn't, it'll bite you in the ass eventual. Sure, that's a specialisation of what you said, but still a useful one, as it allows you to immediately know you shouldn't use Mongo (unless your data is all truly non-relational, and you know you'll never integrate it with any relational data, which, without a crystal ball, you can't know, so don't use it.)

[+] Axsuul|12 years ago|reply

Can anyone explain what are some actual real-life good uses for MongoDB?

[+] colinbartlett|12 years ago|reply

This is ridiculous linkbait bullshit.

Anyone who dismisses document stores entirely has lost all my respect. It wasn't the right solution for your problem, but it might be the right solution for many others.

[+] y0ghur7_xxx|12 years ago|reply

> but it might be the right solution for many others.

The author made the example of the movie database and explained why it was a good idea when they started, and why it didn't work out. Can you point out an example of data you would store in a document database, which is not purely for caching purposes?

[+] neoveller|12 years ago|reply

Agreed. Also, this looks way more like a case where the author mis-structured his data for his intended use case, and is blaming the tool instead of the skill level used to implement it. Nesting deeper than one level in a document is rarely going to result in sufficient query capability with respect to the nested items. Even MySQL can't nest rows inside other rows, which is what he seems to have wanted. Maybe he chose MongoDB because he wasn't ready to think around the architecting issues that an SQL-based database would require, which happen to be, although not immediately obvious, similar to those in Mongo.

[+] ryanobjc|12 years ago|reply

They say never jump in an argument late... but here goes...

There is a lot of people arguing in the positive for polyglot persistence. The arguments sound pretty appealing - hammer, nail, etc - on the face of it.

But as you dig deeper into reality, you start to realize that polyglot persistence isn't always practically a great idea.

The primary issue to me boils down to safety. This is your data, often the only thing of value in a company (outside the staff/team). Losing it is cannot happen. Taking it from there, we stumble across the deeper issue that all databases are extremely difficult to run. DBAs dont get paid $250k/year for nothing. These systems are complex, have deep and far reaching impacts, and often take years to master.

Given that perspective, I think it then makes the decision to use a single database technology for all primary storage needs totally practical and in fact the only rational choice possible.

[+] electrichead|12 years ago|reply

I'm going to reiterate as others have done - this is an area where a good graph database would blow all the others out of the water. I am currently using neo4j for a web app and find it to be extremely good in terms of performance. There is really only one downside to using a graph database - they are not really scalable horizontally as you might want. They need a fair bit of resources. But in terms of querying, they would be unparalleled in this particular use-case.

They are also not in infancy - they are in use in many places where you wouldn't expect them and which aren't discussed. One big area is network management - at least one major telecom uses a particular graph db to manage nodes in real-time.

[+] 3327|12 years ago|reply

This is a well known and well document "down side" of mongo. Frankly the analysis of your article is jeopardized by your first line stating, "I am not a database designer". Mongo has its downsides that are well known, but there are also very good reasons to use Mongodb too. although its a lengthy article with good examples it states nothing more than an obvious caveat mongo has which is well known and documented.

[+] danso|12 years ago|reply

My advice to the OP: Re-jigger this article and retitle it: "The Data-Design of Social Networks". That would be a worthwhile read and I appreciate the detail that the OP goes into.

One of the subheads should be: "Why we picked the wrong data store and how we recovered from it"

And not to be snarky about it, but an alternative title is: Why Diaspora failed: because a Ruby on Rails programmer read an Etsy blog and thought they understood databases

[+] meritt|12 years ago|reply

OP should have been using a graph database. Ranting about MongoDB because it doesn't support what it's not designed to support is a bit silly. A RDBMS would have been just as poor of a choice here.

337 comments