Things I wish I knew about MongoDB a year ago

[+] dia80|13 years ago|reply

Genuine question:

In what use cases does mongo kick mysql's ass?

I've used it a couple of times in hobby projects and enjoyed not maintaining a schema. I read so many of these 'gotcha' style articles and for example one commenter here wants to have a manual "recently dirty" flag to combat the master / slave lag mentioned in the article. I know it's faster (tm) but once you have to take in to account all this low level stuff you have to worry about yourself wouldn't it just be better to rent/buy another rack of mysql servers and not worry about it?

Look forward to learning something...

[+] thibaut_barrere|13 years ago|reply

MongoDB kicks ass in the following situations (real projects I did as a freelancer):

- dealing with semi-structured input (forms with some variability) and storing as a document, all while being able to query across the data

- used as a store to provide very flexible ETL jobs (with ability to upsert, filter/query, geonear etc)

For those situations, I would definitely use MongoDB again. As a RDBMS replacement, I wouldn't use it today.

[+] alexro|13 years ago|reply

IMO there is no rational explanation to this phenomena other than: people are different. Some get bored with stored procs and want same hassle but in another form.

[+] elchief|13 years ago|reply

When you have so many writes that sharding isn't enough.

When you make changes so fast you need a liquid schema.

When you want to make your boss learn map-reduce so he can query the data.

When the application can take care of integration and not the database itself.

[+] diminoten|13 years ago|reply

Mongo isn't so important to this question as ODB vs RDBMS. Here's some light reading:

http://en.wikipedia.org/wiki/Object-relational_impedance_mis...

MongoDB is just ODB, and MySQL is just RDB.

Besides, postgres is the real future!

[+] jeremyjh|13 years ago|reply

One issue with MySQL in large databases is that schema changes are extremely expensive, so much so that you'll be making design decisions around it (e.g. how do we implement this feature without executing our two-day alter table statement). Not all RDBMS have this problem to such a degree but none can escape it entirely.

A lot of the gotchas that he notes are related to design trade-offs with different default behavior than an RDBMS typically would have. For example as your system gets large enough in MySQL you may find you have to do asynchronous replication as well, and then you will have similar problems with dirty reads.

[+] mrkurt|13 years ago|reply

Eventually consistant replication is not unique to MongoDB, most DBs have an async replication option. Using "not Mongo" won't really solve it.

[+] untog|13 years ago|reply

I think there are legitimate reasons to use a "NoSQL" solution rather than MySQL. I'm more interested to know in what use cases Mongo kicks it's competitors asses? What are it's competitors, even? I'll admit that the NoSQL world is a slightly blurry mess to me, with different products seemingly optimised for different cases.

[+] mattparlane|13 years ago|reply

My number one reason for choosing MongoDB is replication that just works out of the box and doesn't require either a read lock or shutting down the master to set up a new slave.

[+] mrinterweb|13 years ago|reply

Map reduce across sharded servers comes to mind as an advantage. For that matter, horizontal scalability in general is a big advantage that many of the NoSQL data stores have over RDBMS databases.

[+] tomschlick|13 years ago|reply

I'm so glad this wasn't another case of someone just ranting about using mongo for the wrong purpose and being mad about it a year later.

[+] trafficlight|13 years ago|reply

I also appreciate how he pointed out positive things that he just wasn't aware of initially.

[+] nickzoic|13 years ago|reply

The count({condition}) one is a worry. I'm guessing it is slow in the case where it has to page the index in in order to count it. I wonder if it is still a problem where the index is used a lot anyway. A fix in MongoDB would seem a lot better solution than having everyone implement their own hacky count-caching solution.

EDIT: Actually, looking at the bug reports, sounds like maybe lock contention on the index?

The master/slave replication problem seems bad but I think it can be worked around (for my particular project) with a flag on the user session ... if they've performed a write in the last 30 seconds, set slaveOkay = false. Users who are just browsing may experience a slight delay in seeing new documents but users who are editing stuff will see their edits immediately.

[+] lars512|13 years ago|reply

The inconsistent reads in replica sets is something we've come across with MySQL read slaves as well. I think it's a gotcha of that whole model of replication, rather than a MongoDB-specific issue.

[+] mgummelt|13 years ago|reply

I'm not aware of any database that solves this problem. Is there one? As far as I know, mysql reads must be distributed to the slaves at the application level, which has no knowledge of master/slave inconsistency. I suppose the time delta between master and slave can be queried, but that still doesn't protect from race conditions/inconsistent reads. This is actually why we chose to only utilize slaves for data redundancy rather than read throughput at my last company. Inconsistent reads weren't tolerable.

[+] jbert|13 years ago|reply

One way to resolve it is to mark that user or session (or even just request) "sticky to the master" for long enough to cover your normal replication delay.

When we saw it before, ensuring that a given request which issued a write also read from the master was sufficient. (sub-second replication delay).

[+] nevinera|13 years ago|reply

>Range queries are indexed differently

If I'm reading your description right, this is hardly mongo-specific. Try it in mysql, for example:

(index is [:last, :first])

  select first from names 
  where last in ('gordon','holmes','watson')
  order by first;

An index is an ordering by which a search may be performed - to illustrate, the index for my small table looks pretty much like this:

  gordon, jeff
  holmes, mycroft
  holmes, sherlock
  watson, john

Unless the first key is restricted to a single value, it can't order by the second key without performing at least a merge-sort. They aren't in that order in the index.

[+] foobar2k|13 years ago|reply

He never said it was mongo specific

[+] jameswyse|13 years ago|reply

One thing I love MongoDB for is it's geospatial indexing abilities: http://www.mongodb.org/display/DOCS/Geospatial+Indexing

Was a really nice surprise when I was building a location based web app.

[+] jsemrau|13 years ago|reply

That was our use-case as well. And it works fine for this but just in the application layer. We are not using Mongo for data storage (at least we are not trusting it to hold it for long)

[+] wakaflaka|13 years ago|reply

[+] chris123|13 years ago|reply

Is MongoDB more marketing hype than quality product? I've heard it before and this article seems to point in that direction as well.

[+] kokey|13 years ago|reply

I think it's generally full of gotchas similar to that of SQL databases like MySQL and Oracle. In fact, most of the issues mentioned in this article, like delayed replication, indexed queries and using 'explain' are issues I've had to deal with in MySQL and Oracle. Most of these databases are fine out of the box for small scale use, but when you scale up you have to deal with these 'gotchas' like indexing, partitioning, bulk loading, and having to profile everything etc.

[+] lmm|13 years ago|reply

Yes, but only because it has ~infinite marketing hype.

It isn't and shouldn't be a general replacement for a RDBMS; it makes some interesting sacrifices for performance that you have to understand before using it. But it is very much a quality product; it makes some easy things very easy and some very hard things possible.

108 comments