MongoDB queries don’t always return all matching documents

[+] im_down_w_otp|9 years ago|reply

Said it before, will say it again... "MongoDB is the core piece of architectural rot in every single teetering and broken data platform I've worked with."

The fundamental problem is that MongoDB provides almost no stable semantics to build something deterministic and reliable on top of it.

That said. It is really, really easy to use.

[+] eloff|9 years ago|reply

As a guy who works on ACID database internals, I'm appalled that people use MongoDB. You want a document store? Use Postgres. Why on earth would you use a database that makes so little in the way of guarantees about what results you get from it? I think most people have really low load and concurrency, so things seem to work. When things get busier you're in for a world of pain. Look I get that's it's easy to use and easy to get started with, but you're going to pay for all of that later.

[+] joeblau|9 years ago|reply

I worked at a Data Analytics start up in Palo Alto back in 2011 and we had 8 or 9 databases in our arsenal for storing different types of data. MongoDB was by far the worst and most unstable database we had. It was so bad that for the presidential debate, I had to stay up and flip servers all night because even though the shards were perfectly distributed, the database would crash and fail over to two other machines which couldn't handle our entire social media stream. We ended up calling some guys from MongoDB in to help us troubleshoot the issue and the guy basically said "Yeah we know that's a limitation; you should probably buy more machines to distribute the load." I like the concept of Mongo, but there are other more robust NoSQL databases to choose from.

[+] zeemonkee3|9 years ago|reply

NodeJS + MongoDB is this generation's Laurel and Hardy stack ("look at this mess you got me into"). Last generation's was PHP + MySQL.

[+] partycoder|9 years ago|reply

Not to mention that many MongoDB drivers are worse than MongoDB itself, adding insult to an already unprovoked vicious injury.

The official Java driver is the easiest way to waste otherwise useful CPU time due to its blocking nature and wasteful threading model.

[+] Idontreddit|9 years ago|reply

Yes it is easy to use. But too bad it also fails "transactions" silently so that you don't even know if your changes were "committed" or not. Don't worry, it only happens every once it a while so it's not a big deal...

Unless you are coinbase or an organization that deals with money/bitcoins/etc and you need ACID compliant transactions so that "debits/credits" don't just magically disappear.

When the bitcoin craze was going crazy, coinbase had all kinds of problems due to their mongodb backend.

[+] JustSomeNobody|9 years ago|reply

I remember during peak mongo that if you weren't working on a project that used mongo, other devs looked down their noses at you.

[+] badthingfactory|9 years ago|reply

It's pretty easy to use... until you have to normalize data and query across one or two joins. I've been forced to build with mongo for the past few months (still not sure why) and I can't think of a single valid use-case for this rubbish.

If you need denormalized/distributed caching, Redis does a good job. If you need to store some unstructured json blobs, postgres and now sql server 2016 can do that. If you need reliable syncing for offline capable apps, you probably want CouchDB. If you need real time, use Rethink Obviously, relational data belongs in a relational database.

I think the problem is that all of these databases do one or two things really well. Mongo tries to do all of these things, and does so very poorly.

[+] draw_down|9 years ago|reply

All I ever hear is terrible things about it. I'm not a "gotta hear both sides" kind of person so that means something to me.

[+] lossolo|9 years ago|reply

I've just migrated one project from mongo to postgresql and i advise you to do the same. It was my mistake to use mongo, after I've found memory leak in cursors first day I've used the db which I've reported and they fixed it. It was 2015.. If you have a lot of relations in your data don't use mongo, it's just hype. You will end up with collections without relations and then do joins in your code instead of having db do it for you.

[+] sotojuan|9 years ago|reply

Even if you want NoSQL I'd use RethinkDB over Mongo any day. Way better query language, real-time support, and relational/regular SQL-like stuff.

https://rethinkdb.com/docs/comparison-tables/

[+] Joeboy|9 years ago|reply

> don't use mongo, it's just hype

I'm kind of curious as to where this hype is. I've almost never heard anybody say anything positive about mongodb. All I ever see is people saying it's terrible / hilarious for various reasons.

[+] mikegioia|9 years ago|reply

    If you have a lot of relations in your data don't
    use mongo

Why would you use Mongo if you have lots of relational data? Why would you not start with a relational database for that?

I know Mongo has issues but it's never going to beat an RDBMS on relational queries.

[+] Zikes|9 years ago|reply

PostgreSQL and jsonb might make it a more seamless migration that some may think:

https://wiki.postgresql.org/wiki/What%27s_new_in_PostgreSQL_...

JSONB values are indexable and queryable, so there aren't very many downsides.

[+] smanuel|9 years ago|reply

> If you have a lot of relations in your data don't use mongo, it's just hype. You will end up with collections without relations and then do joins in your code instead of having db do it for you.

So... you are not against MongoDB but against NoSQL in general? I've used MongoDB and I've never ended up with lots of joins in my code. But I guess it all depends on the use case and how you've structured your data. Document databases are not a silver bullet.

[+] rtnyftxx|9 years ago|reply

ArangoDB in contrast to MongoDB is a ACID document store also with graphs.

[+] rco8786|9 years ago|reply

> If you have a lot of relations in your data don't use mongo

Perhaps you should have been using a relational database from the get go.

Sounds more like your issue, not mongo's.

[+] hinkley|9 years ago|reply

I still want to know how anybody is making money off of data that doesn't have a bunch of relationships in it.

[+] hardwaresofton|9 years ago|reply

If you're currently using MongoDB in your stack and are finding yourselves outgrowing it or worried that an issue like this might pop up, you owe it to yourself to check out RethinkDB:

https://rethinkdb.com/

It's quite possibly the best document store out right now. Many others in this thread have said good things about it, but give it a try and you'll see.

Here's a technical comparison of RethinkDB and Mongo: https://rethinkdb.com/docs/comparison-tables/

Here's the aphyr review of RethinkDB (based on 2.2.3): https://aphyr.com/posts/330-jepsen-rethinkdb-2-2-3-reconfigu...

[+] lath|9 years ago|reply

A lot of Mongo DB bashing on HA. We use it and I love it. Of course we have a dataset suited perfectly for Mongo - large documents with little relational data. We paid $0 and quickly and easily configured a 3 node HA cluster that is easy to maintain and performs great.

Remember, not all software needs to scale to millions of users so something affordable and easy to install, use, and maintain makes a lot of sense. Long story short, use the best tool for the job.

[+] ahi|9 years ago|reply

This has also been my experience. Millions of large documents on a single (beefy) node with a single user it's been fine. Although, the sysadmins had previously left me with flat file xml on shared storage so the bar was pretty low.

[+] danbmil99|9 years ago|reply

Ha ha, had to scroll all the way down to find a positive comment. It's actually a great paradigm; not every problem fits into a relational box.

[+] danbmil99|9 years ago|reply

Oh, the fud of it.

The behavior is well documented here https://jira.mongodb.org/browse/SERVER-14766

and in the linked issues. Seasoned users of mongodb know to structure their queries to avoid depending on a cursor if the collection may be concurrently updated by another process.

The usual pattern is to re-query the db in cases where your cursor may have gone stale. This tends to be habit due to the 10-minute cursor timeout default.

MongoDB may not be perfect, but like any tool, if you know its limitations it can be extremely useful, and it certainly is way more approachable for programmers who do not have the luxury of learning all the voodoo and lore that surrounds SQL-based relational DB's.

Look for some rational discussion at the bottom of this mongo hatefest!

[+] ahachete|9 years ago|reply

Strongly biased comment here, but hope its useful.

Have you tried ToroDB (https://github.com/torodb/torodb)? It still has a lot of room for improvement, but it basically gives you what MongoDB does (even the same API at the wire level) while transforming data into a relational form. Completely automatically, no need to design the schema. It uses Postgres, but it is far better than JSONB alone, as it maps data to relational tables and offers a MongoDB-compatible API.

Needless to say, queries and cursors run under REPEATABLE READ isolation mode, which means that the problem stated by OP will never happen here. Problem solved.

Please give it a try and contribute to its development, even just with providing feedback.

P.S. ToroDB developer here :)

[+] nimrody|9 years ago|reply

How does ToroDB handles sharding across multiple instances?

[+] cachemiss|9 years ago|reply

My general feeling is that MongoDb was designed by people who hadn't designed a database before, and marketed to people who didn't know how to use one.

Its marketing was pretty silly about all the various things it would do, when it didn't even have a reliable storage engine.

Its defaults at launch would consider a write stored when it was buffered for send on the client, which is nuts. There's lots of ways to solve the problems that people use MongoDB for, without all of the issues it brings.

[+] vegabook|9 years ago|reply

I have moved from Mongo to Cassandra in a financial time series context, and it's what I should have done straight from the getgo. I don't see Cassandra as that much more difficult to setup than Mongo, certainly no harder than Postgres IMHO, even in a cluster, and what you get leaves everything else in the dust if you can wrap your mind around its key-key-value store engine. It brings enormous benefits to a huge class of queries that are common in timeseries, logs, chats etc, and with it, no-single-point-of-failure robustness, and real-deal scalability. I literally saw a 20x performance improvement on range queries. Cannot recommend it more (and no, I have no affiliation to Datastax).

[+] jsemrau|9 years ago|reply

Weird to see that Mongo is still around. We started to use them on a project ~4 years ago. Easy install, but that's where the problems started. Overall terrible experience. Low performance, Syntax a mess, unreadable documentation.

They seem to still have this outstanding marketing team.

[+] paradox95|9 years ago|reply

Should an infrastructure company be advertising the fact that it didn't research the technology it chose to use to build its own infrastructure?

All these people saying Mongo is garbage are all likely neckbeards sysadmins. Unless you're hiring database admin and sysadmins, Postgres (unless managed - then you have a different set of scaling problems) or any other tradition SQL store is not a viable alternative. This author uses Bigtable as a point of comparison. Stay tuned for his next blog post comparing IIS to Cloudflare.

Almost every blog post titled "why we're moving from Mongo to X" or "Top 10 reason to avoid Mongo" could have been prevented with a little bit of research. People have spent their entire life working with the SQL world so throw something new at them and they reject it like the plague. Postgres is only good now because they had to do some of the features in order to compete with Mongo. Postgres been around since 1996 and you're only now using it? Tell me more about how awesome it is.

[+] glasser|9 years ago|reply

My goal in writing this post was not to convince people to use or not use MongoDB, but to document an edge case that may affect people who happen to use it for whatever reason, which as far as I could tell was inadequately documented elsewhere.

[+] ruw1090|9 years ago|reply

While I love to hate on MongoDB as much as the next guy, this behavior is consistent with read-committed isolation. You'd have to be using Serializable isolation in an RDBMS to avoid this anomaly.

[+] teraflop|9 years ago|reply

I think this is incorrect, but it's not as simple as the other replies are making it out to be.

Under read-committed isolation, within a single operation, you must not be able to see inconsistent data. So if you do "SELECT <star>" on a table while rows are being updated, you're guaranteed to always see either the old value or the new value. But if you do two separate statements, "SELECT <star> WHERE value='new'" and "SELECT <star> WHERE value='old'" in the same transaction, you may not see the row because its value could have changed. Serializable isolation prevents this case, typically by holding locks until the transaction commits.

It gets messy because the ANSI SQL isolation levels are of course defined in terms of SQL statements, which don't map perfectly to the operations that a MongoDB client can do. Mongo apparently treats an "index scan" as a sequence of many individual operations, not as a single read. So you could argue that it technically obeys read-committed isolation, but it definitely violates the spirit.

[+] prodigal_erik|9 years ago|reply

This is worse than read-committed because you're not even seeing the old state of the document. If an update moves a document around within the results, and it ends up in the portion you've already read, you just don't see it at all.

[+] ht85|9 years ago|reply

The article suggests that tuples being moved to different storage locations can cause them to not show up in a table scan.

No such thing can happen in a sane RDBMS, no matter the transaction isolation level.

[+] anarazel|9 years ago|reply

In postgres (and a fair number of other databases) you'll not see that anomaly, even with read committed. Usually you'll want to have stricter semantics for an individual query, than for the whole transaction.

[+] the_mitsuhiko|9 years ago|reply

With read-committed you see the old state.

[+] pgaddict|9 years ago|reply

Quoting from the very first paragraph of the blog post:

> Specifically, if a document is updated while the query is running, MongoDB may not return it from the query — even if it matches both before and after the update!

How's that compatible with READ COMMITTED isolation level?

[+] twunde|9 years ago|reply

The real problem with Mongo is that it's so enjoyable to start a project with that it's easy to look for ways to continue using it even when Mongo's problems start surfacing. I'll never forget how many problems my team ended up facing with Mongo. Missing inserts, slow queries with only a few hundred records, document size limits. All while Mongo was paraded as web scale in talks.

[+] wzy|9 years ago|reply

Does Meteor support a proper database system yet, a la. MySQL or Postgres?

[+] aavotins|9 years ago|reply

MongoDB reminds me of an old saying that if you have a problem and you use a regex to solve it, you end up with two problems.

I have personally used MongoDB in production two times for fairly busy and loaded projects, and both times I ended up to be the person that encouraged migrating away from MongoDB to a SQL based storage solution. Even at my current job there's still evidence that MongoDB was used for our product, but eventually got migrated to PostgreSQL.

Most of the times I've thought that I chose the wrong tool for the right job, which may be true, but still leaves a lot of thought about the correct application. Right now I have a MongoDB anxiety - as soon as I start thinking about maybe using it(with an emphasis on maybe), I remember all the troubles I went through and just forget it.

It is certainly not a bad product, but it's a niche product in my opinion. Maybe I just haven't found the niche.

[+] MoOmer|9 years ago|reply

I literally brought up the regex joke in a meeting yesterday. A data warehouse was built on top of Mongo, and I get to help clean up the mess.

[+] jtchang|9 years ago|reply

This single issue would make me not want to use MongoDB. I'm sure there are design considerations around it but I rather use something that has sane semantics around these edge cases.

[+] Animats|9 years ago|reply

Not when they're changing rapidly, anyway. Well, that's relaxed consistency for you.

Does this guy have so many containers running that the status info can't be kept in RAM? I have a status table in MySQL that's kept by the MEMORY engine; it's thus in RAM. It doesn't have to survive reboots.

[+] fiatjaf|9 years ago|reply

CouchDB is simple and reliable. You can understand it from day one. I can't imagine why it isn't being used.

[+] avital|9 years ago|reply

I believe this is solved by Mongo's "snapshot" method on cursors: https://docs.mongodb.com/v3.0/faq/developers/#faq-developers...

[+] glasser|9 years ago|reply

If I understand correctly, this method says "only scan the built in _id index, not any other index". Which means that you will not hit this index-specific bad behavior, but also that you won't get the performance characteristics of using an index.

[+] rjurney|9 years ago|reply

Mongo is hilarious. Ease of use is so important, we just don't much give a shit that it has all these gaping holes and flaws in it.

[+] shruubi|9 years ago|reply

Seriously, who looks at MongoDB and thinks "this is a sane way of doing things"?

To be fair, I've never been much of a fan of the whole NoSQL solution, so I may be biased, but what real benefits do you gain from using NoSQL over anything else?

[+] cortesoft|9 years ago|reply

Web scale!

[+] jeltz|9 years ago|reply

Benefits of using MongoDB? Nothing. There are on the other hand other NoSQL systems which offer real benefits. Like Cassandra which gives reliable distributed database with not too much effort.

397 comments