top | item 4532922

Why I Migrated Away From MongoDB

203 points| svs | 13 years ago |svs.io | reply

200 comments

order
[+] gregjor|13 years ago|reply
You were fortunate to recognize that MongoDB was the wrong tool for your job, and lucky to be able to move to Postgres instead of continuing to throw your time and effort away. I see the ad hominem "you're an ignorant idiot" attacks already started, along with advice like using regexes to do case-insensitive searches. Watching the NoSQL "movement" encounter the problems RDBMSs fixed 20 years ago and then hand-wave and kludge them away is frustrating. I wrote about some of this in http://typicalprogrammer.com/?p=14.

Look at the bright side: programmers who are writing NoSQL-backed apps are creating the fossil fuel that will keep programmers who know RDBMs working into our retirement years. I already have more work than I can do fixing web apps that were built around crap data management tools that failed to scale beyond a few thousand users. Your Postgres expertise will still be a money-making skill long after MongoDB is forgotten.

[+] gaius|13 years ago|reply
The thing with the NoSQL guys is that many of them seem not to be in a position to make an educated comparison. For example, an, uhh, enthusiastic MongoDB advocate recently informed me that MongoDB was superior to Oracle because in Oracle you had to poll a table to see if it changed. Except, no, that isn't actually true: http://docs.oracle.com/cd/B19306_01/appdev.102/b14251/adfns_... - and that document is from 2005. And you could do a trigger and an AQ message/callback 5 years before that (at least). You haven't needed to poll an Oracle database for changes in a loooong time.

Basically, every evangelism point, you have to double-check and cross-reference, because as you say, the NoSQL guys are encountering issues the RDBMS community addressed years ago (7 in my example, but the sharding stuff, 20+ years) - except they think they are discovering it for the first time!

[+] blaines|13 years ago|reply
So do you believe there is a use case for a document-oriented database? I feel like your comment writes off a huge swath of useful technology.

I said this below, but I'll say it again. Data is malleable, and writing apps to fit around any datastore seems wrong. I write applications to fulfill their use case. When the needs of the application change, so could it's database (or other dependencies).

http://gigaom.com/cloud/mongodb-or-mysql-why-not-both/

[+] wamatt|13 years ago|reply
> Watching the NoSQL "movement" encounter the problems RDBMSs fixed 20 years ago and then hand-wave and kludge them away is frustrating.

But but but.... NoSQL is cool ;)

[+] rbranson|13 years ago|reply
I'm no fan of MongoDB, but this same advice goes for any NoSQL data store. I am an Apache Cassandra contributor and community MVP, but my advice stays the same: it's best just to start with a SQL database and go from there. Read some books and learn it well: the "SQL Cookbook" from O'Reilly is great, and so is "The Art of SQL." Premature optimization continues to be the root of all evil.
[+] stickfigure|13 years ago|reply
it's best just to start with a SQL database and go from there.

This is bad advice. It's best to understand your problem domain and use the tools that are most appropriate. You see a lot of two types of posts on HN:

* "I picked a NoSQL database for a problem domain with a better relational fit." Those posts look like this one.

* "I picked an RDBMS for a problem domain with a better NoSQL fit." Those posts are usually titled something like "How I scaled Postgres to XYZ qps" and describe an insane amount of re-engineering and operational hell. Oddly, these posts are usually proud of the accomplishment rather than embarrassed that they picked the wrong tools in the first place.

There are upsides and downsides to RDBMSes. From my experience, you should be leaning towards NoSQL systems (of which there are many, each suited to different use cases) when you have very large scaling needs (in terms of dataset and qps), heavily polymorphic data, or data that has ambiguous structure.

It's been stated many times already: Use the right tool for the right job.

[+] bunderbunder|13 years ago|reply
Don't forget to read a book that's specific to your particular RDBMS. Because SQL databases are only trivially interchangeable for trivial cases.

There have been more than a couple times when I was ready to blame the relational model, but further investigation revealed that the real root of the problem was that the existing schema or query used an approach that was optimized for one DBMS but performed terribly on the one we were actually using.

[+] egiva|13 years ago|reply
Thanks rbranson, this comment was very helpful. I also think that the logical fallacy of some of these arguments is that they´re absolutist - it´s better to follow your advice because it´s a very logical progression to start with a SQL database and then move into different things as you grow. Life is a progression and I think your quote about premature optimization is spot on - it IS the root of all evil because it´s absolutist and too rigid to fit the curve of everything that life (and tech, and work) throws at you!
[+] jamesli|13 years ago|reply
I agree that relational databases are a safer bet after you study the business domains, consider the pros and cons of relational databases vs NoSQL, and there are no clear winners.

Disagree to the books recommended. SQL is only a query language, not the database itself. It definitely should be part of the consideration. Understanding how database engines works under the hood is more important in terms of performance in high-concurrency, high load scenarios.

[+] Ingaz|13 years ago|reply
Funny.

The same problem exists with OLAP databases.

People trying to do MOLAP without building ROLAP. They are thinking that MS SASS, ESSBASE, Cognos or Qlikview will do some "magic" that eliminate the need to carefully think about data

[+] bunderbunder|13 years ago|reply
Fourthly, and this one completely blew my mind - somewhere along the stack of mongodb, mongoid and mongoid-map-reduce, somewhere there, type information was being lost. I thought we were scaling hard when one of our customers suddenly had 1111 documents overnight. Imagine my disappointment when I realised it was actually four 1s, added together. They’d become strings along the way.

I've been having a similar problem with an SQLite data store, only the other way around. Strings were getting converted to numbers, with leading zeros that were significant and needed to be maintained being lost along the way.

It sucked all the fun out of dynamic typing for me. At least in combination with automatic type conversions. Having to think about type and when to make transitions across type boundaries when you need to is just a little light busywork. Having to worry about type and transitions across type boundaries being made contrary to your intentions is a downright PITA and, it turns out, a serious quality control issue.

[+] zemo|13 years ago|reply
Mongo accepts the data you give it. If you have a type-conversion error, it's in your application layer. I use Mongo daily and have never seen this problem, because I'm using a statically typed language. This seems like more of a complaint about Ruby than Mongo.

I use Mongo daily on a Go project, and I actually think it's pretty annoying; I'm not trying to be a Mongo apologist, but ... this type conversion argument doesn't seem to be very fair to Mongo.

[+] parfe|13 years ago|reply
"Any column in an SQLite version 3 database, except an INTEGER PRIMARY KEY column, may be used to store a value of any storage class." http://www.sqlite.org/datatype3.html

SQLite, for better or worse, is designed to do what you had an issue with. Pick a different DB if you need strict data types. Check out section 2.0 Type Affinity.

[+] dccoolgai|13 years ago|reply
"To be honest, the decision to use MongoDb was an ill-thought out one. Lesson learned - thoroughly research any new technology you introduce into your stack, know well the strengths and weaknesses thereof and evaluate honestly whether it fits your needs or not - no matter how much hype there is surrounding said technology."

I think you are not alone in learning this lesson with this particular technology. Fortunately it's one I learned by proxy from working adjacent to a team that decided to introduce Mongo into their stack...but I still wake up and hear the sceams at night of "You have to put the whole dataset on RAM?"...you weren't there, man...we lost a lot of good guys...

You have to draw a clean line between "stuff it is really fun and enlightening to play with" and "stuff you introduce into your stack".

[+] mgkimsal|13 years ago|reply
One of the issues is that if you actually do this - evaluate, look at all sides, and decide 'shiny new tech' IS NOT right for your situation/project, you're branded as something not good. "Not a team player", "stick in the mud", "not able to keep up with the times", etc.

I'm not suggesting everyone should be sticking with 1966 COBOL - times change, new tech comes up which makes sense to adopt. But not jumping on the shiny new tech bandwagon can have social consequences you need to be aware of.

[+] trafficlight|13 years ago|reply
>> "You have to put the whole dataset on RAM?"

I'm pretty new to the whole database thing, but how is MongoDB different from Postgres or Mysql in this respect? In a traditional database, the data is pulled directly from the hard drive. Why does Mongo suffer a performance hit and Mysql doesn't?

[+] stonemetal|13 years ago|reply
You have to draw a clean line between "stuff it is really fun and enlightening to play with" and "stuff you introduce into your stack".

Even if you have a clean line, when do you promote something across that line? It is easy for little bits of weirdness to escape detection(Mostly thinking about Cassandra write failures here.) in a fun to play with environment especially if they only come out when you are running a cluster.

[+] lttlrck|13 years ago|reply
"You have to put the whole dataset on RAM?"

This is hardly a hidden feature...

[+] jamesli|13 years ago|reply
I am both a database guy and a software engineer. Being a software engineer, i kind of understand the hype behind NoSQL. Being a database guy spending years in studying how database engine works under the hood, many NoSQL implementations make me wonder how powerful marketing can be.

In general, I love the ideas behind NoSQL. I can still feel the excitement when reading the BigTable and MapReduce papers. HBase, Hadoop, Radis, etc. are awesome products. I use some of them in my work. But some other NoSQL products? Being engineers, we must understand the implementation and be full aware of its limitations, instead of believing their marketing materials. Well, if all you want is to test a toy product, to build a prototype, or your product is of low concurrency and low data size and you have no concern on operation, it certainly looks that they make your development easier. But in these scenarios, any good relational databases won't add significant burden either.

[+] taligent|13 years ago|reply
> Being engineers, we must understand the implementation and be full aware of its limitations, instead of believing their marketing materials.

And as engineers we must understand that most other engineers do take their role seriously and evaluate products on their merits.

Implying that they are falling for "marketing" just because you don't agree with their choice and then lecturing them for their choice doesn't make you come across well.

[+] jaimebuelta|13 years ago|reply
Mmm, not sure about some of the complains...

- You can make case insensitive searches on the DB using regexes (http://www.mongodb.org/display/DOCS/Advanced+Queries#Advance...). A simple case-insensitive regex is not very bad performance-wise, but in general, case-insensitive searches should be avoided for search purposes (you can normalize to set everything to lower case or other equivalent trick)

- The proper way of doing an audit (and search later) is to make an independent collection with a reference to the other(s) document in a different collection. Then you can index by user, date, or any other field and leave the main collection alone. The described embedded access collection doesn't look very scalable.

- Making map-reduce queries is tricky (at least for me). I think the guys on 10gen realizes that and the new aggregation framework is a way of dealing with this. Anyway, the main advantage of SQL is this kind of things, the rich query capabilities. Even if MongoDB allows some compared with other NoSQL DBs, if there is a lot of work in defining new queries, probably a SQL DB is the best fit, as that is where SQL excel.

I don't truly believe in this "you should research everything before starting" (I mean, I believe in research, but too many times the "you should do your homework" argument is overused. Sometimes you make a decision based in some data that changes later, or is incomplete), as there are a lot of situations where you find problems as you go, not in the first steps. But, according to the description, looks like PostgreSQL is a better match and the transition hasn't been too painful, so we can classify this into "bump in the road"/"success history". Also, probably right now the DB schema is way more known that in the beginning, which simplifies the use of a relational DB.

[+] datasage|13 years ago|reply
- The proper way of doing an audit (and search later) is to make an independent collection with a reference to the other(s) document in a different collection. Then you can index by user, date, or any other field and leave the main collection alone. The described embedded access collection doesn't look very scalable.

I think this point is very important even in the RDBMS side. There are cases, even with relational datastores that would preform better if the dataset was built to the query.

The difficulty comes into play when you are trying to keep the denormalized data up to date based on changes within the base dataset.

[+] daveman|13 years ago|reply
As an analytics professional who was pressured into a MongoDB environment, I feel the OP's pain. If you want to do gymnastics with your data, (aggregations of aggregations, joining result sets back onto data), SQL expressions are a 1000 times easier than Mongo constructs (e.g. map reduces). We usually ended up scraping out data from Mongo and dumping records into a SQL database before doing our transformations.

All that said, our developers loved the ease of simple retrieval and insertion, and of course the scalability. So I guess you ultimately need to base your decisions on your priorities.

I don't fault the OP though, since it's hard to know just how limiting NoSQL will be until you try to do all the things you used to assume were database tablestakes (no pun intended).

[+] dkhenry|13 years ago|reply
Competly aside from the Article. The level of vitriolic discourse in this topic is astounding. I am amazed that as a community discussions of Database engines can draw out such mean spirited anger. I have never down voted as many comments on HN in a single thread then I have on this topic. I don't care which side of the debate you come down on. There is no excuse for belittleing and insulting others in a technical forum. Thats right I am looking at you

    gregjor, gaius, and zemo
In this case it appears to mostly be those arguing for Postgre, but I wouldn't care if you were arguing for sunshine and unicorns there is a way to behave civilly and your not doing it.
[+] gregjor|13 years ago|reply
You can find this level of discourse in plenty of topics every day. Programmers draw blood over indenting with tabs or spaces. It's geek entertainment.

I have never been upvoted so much on HN. I admit to strong opinions and light sarcasm but you'll have to show me where I've been uncivil, belittling or insulting, except perhaps in response to people who insulted me.

[+] einhverfr|13 years ago|reply
In this case it appears to mostly be those arguing for Postgre

Curious. I have never found a database named "Postgre" to be used by anybody. Perhaps you can direct me to the download site.

I think the bigger issue is that there isn't a lot of discussion from the NoSQL crowd about what you give up when you go to a NoSQL solution. I think that sort of disclosure would help people weigh the options a lot better.

From the comments of some people here you'd almost think they would build an ERP app in Mongo....

[+] jaequery|13 years ago|reply
I've ran into similar issue as you described. Something that can be done so simple and quickly in SQL, was bewilderingly difficult to do in mongo.

The schema-less database approach also seems attractive at first but updating your data whenever your "app schema" changes starts to become a pain real quick.

Now I can't really live w/o having a schema first, it actually saves you a lot more time in the long run (even short run), being schema-less means you can't really do anything too fancy w/ your data (generate reports, advanced search, etc...)

[+] lmm|13 years ago|reply
You still need to update your data when doing schema changes under SQL, and you have a lot less control over the process.

And you can do anything to your data without a schema, you just need to build your app as a service that provides access to it.

IME SQL schemas do more harm than good; usually you end up with a schema that's subtly weaker than what's actually valid for your application, and the difference between the two models will trip you up at the worst possible time. Have a small, distinct set of classes that you store, enforce that you store only those (and don't access the storage layer any other way), enforce that they remain backwards compatible, and enforce that you can't create invalid instances. But application code is the best place to do all these things.

[+] dkarl|13 years ago|reply
You have to load every document in the database and extract the audit trail from it, then filter it in your app for the user you’re looking for. Just the thought of what that would do to my hardware was enough to turn me off the whole idea.

Naive question from somebody who has done a little reading on and dabbling with key/document-with-MapReduce style datastores, but who hasn't tackled a real production problem: I thought running queries over the entire dataset was one of the assumptions of horizontally scalable document stores? In terms of avoiding computation, you can only limit queries by document key, which even if you're clever/lucky doesn't always encode the parameters you're querying on, or doesn't encode them in the right order, so you should be prepared to run queries over your entire dataset. Hopefully the queries you run often are optimized (e.g., using indexes or clever use of key ranges), but in the general case, you have to be prepared to scan the whole shebang, and that's supposed to be okay because of horizontal scalability, right?

[+] adambard|13 years ago|reply
As a relative idiot when it comes to this sort of thing, I'd like to insert the following supplementary question: what is the sort of application/dataset for which Mongo is particularly suited?

I've used it on small projects, and have enjoyed it. Perhaps my data has just been simple/loosely-coupled enough to never run into these problems?

I read a lot of posts like this on HN before every trying Mongo, so I've at least been convinced to always implement schema at the application layer. Others seem to keep learning that lesson in harder ways.

[+] karterk|13 years ago|reply
The biggest lure of Mongo is that it gives you a nice SQL-like query API. So it's fairly easy to get started with, compared to other NoSQL alternatives. I primarly use it for small-medium size apps - when I know upfront that I will never need to scale it beyond certain number of users in the short-medium term.

It's not as bad as it's made out to be. It's only if you really are looking to scale out, you should probably be better of picking something else.

[+] stingraycharles|13 years ago|reply
"what is the sort of application/dataset for which Mongo is particularly suited"

The majority of the NoSQL databases are based on Amazon's Dynamo: loosely coupled replication. MongoDB is one of the few (next to Hbase and a few others) that adopts Google BigTable's architecture: data is divided in Ranges, and each mongod node serves multiple Ranges.

This means MongoDB is able to provide atomicity where it's harder with other SQL databases. In particular, we need to be able to do some sort of "compare and swap" operation that is guaranteed to be atomic/consistent, while still being able to have our mongod nodes distributed over multiple datacenters.

In Dynamo-based architectures, in order to provide the same amount of atomicity, you always end up writing to at least half + 1 the amount of replicated nodes you have available in your cluster. This is more awkward, and reduces the flexibility of the whole (the atomicity guarantee Mongo provides also works for stored javascript procedures, for example).

Having said that, we're using MongoDB about 3 years in production at this point, but we're far from happy about the availability it provides (issues like MongoDB not detecting that a node has gone down, failing to fail over, etc). We run a HA service, and to date all of our failures in uptime have been either the fault of our hosting provider or mongodb not failing over when it should. As such, we're always looking for a better alternative to move to, but at the moment MongoDB is about as good as it gets.

[+] redler|13 years ago|reply
digiDoc is all about converting paper documents like receipts and business cards into searchable database, and so a document database seemed like a logical fit(!).

It looks like this single initial assumption is where things started going wrong: conflating the pieces of paper that happen to be called "documents" in the real world with the concept of a "document" in the context of a system like MongoDB.

[+] tonynero|13 years ago|reply
The guy is getting such hate on the comments on his site, yet his opening line is that his choice was ill thought. Let him express his issues right?

I choose MongoDB for my last side project and while it was awesome working schema-less and developing the client facing part of the project was certainly quicker to deliver, i feel pretty lost on the analytics/BI side of it and couldn't say it better than him: "Not having JOINs makes your data an intractable lump of mud"

So coming from a relational/SQL background I found MongoDb awesome upfront, but frustrating later on... and yes I'm off to learn http://docs.mongodb.org/manual/applications/aggregation/

[+] stevencorona|13 years ago|reply
The downside, or challenge, with NoSQL (generally speaking) is that you need to handle your aggregations ahead of time - you need to know what queries you'll want to run in the future when you store your data. If you have some new aggregation you want to keep, you'll need to re-process the data (with Hadoop or something else).

It's the trade-off of being able to scale reads and writes horizontally. And unless you need it, an RDBMS makes sense given the flexibility.

Maybe, instead of looking at NoSQL as a full-on replacement for RDBMS, we can look at it as a better solution to sharding.

[+] aneth4|13 years ago|reply
This is the opposite of "agile." It is difficult to know where your product will be in 2 months let alone 12, so it seems the advice to use SQL first is sound - unless you enjoy long distractions to solve simple JOINs.
[+] dkhenry|13 years ago|reply
This is also true of SQL databases. It just depends on _what_ your aggregating. In this case if you need to aggregate the count of something then in a DB like MySQL you can lean on the index to get a quick count, but if you need to SUM or AVG you will be doing just as much CPU level work as a NoSQL solution. The difference is in MySQL its a simple query with the AVG operator, in MongoDB its a Map-Reduce query which is much harder to write.

In general you need to know what your doing under the hood and how either solution effects your problem domain. Where I work we need to aggregate billions of data points on demand. This can't be done in real time without pre-aggregating the results ( and even then it takes tons of I/O just to process the aggregated data set )

[+] se85|13 years ago|reply
The guy just jumped on the bandwagon without having a clue.

Just reading this blog - it's clear that MongoDB was not a good fit for him, if he had bothered to do some research, he would have found this out on day one.

Thats the real lesson he should be taking away from this and blogging about yet somehow MongoDB are trolls and it's all their fault because of a lack of features and they have bypassed 40 years of computer science and blah blah blah blah, excuses, excuses, excuses.

edit: removed a few pointless sentences :-)

[+] leothekim|13 years ago|reply
"I can only come to the conclusion that mongodb is a well-funded and elaborate troll."

It's possible the reasoning he used to use mongodb is the same as the one he used to abandon it.

[+] chaostheory|13 years ago|reply
For me, what killed my enthusiasm for mongodb is the write locks. Yes they have been greatly improved in the 2.x release but it's still not good enough (for me).
[+] programminggeek|13 years ago|reply
Look, there are some places where document DB's solve problems easier/better than SQL, other places kind of suck. For example, plain old object mapping is easier with a document DB. Relational DB's tend to make your code look/feel/act more relational and less object oriented. Your object model tends to look just like your table structure. This can be good or bad depending on your viewpoint.

There are some approaches to solve some of the author's problems that end up making the Mongo system look and feel a lot more like a SQL system because sometimes data is actually related.

The author could have also taken a different approach to his data schema that would have fit more of a non-relational worldview.

Software development and architecture is about making choices and working with and around the limitations of your tools. It doesn't matter if PostgreSQL or MongoDB are "better". It's about solving a problem using a set of tools you are comfortable with.

[+] mrinterweb|13 years ago|reply
I find this article to be more a reflection of a NoSQL newbie's failed foray with a document database that later realized that the grass is not as green as originally perceived. The developer realized that he does not like map-reduce and missed not having joins. I don't see how this person's failed experience with MongoDB is a reflection on MongoDB.

I think the recent popularity of MongoDB bashing is maybe a testament to MongoDBs popularity. I'd guess that because MongoDB is probably the closest NoSQL database to a RDBMS with its ad hoc queries, that it is attracting many newcomers.