Jepsen Disputes MongoDB's Data Consistency Claims

[+] madhadron|5 years ago|reply

In the circles I run in, MongoDB is regarded as a joke and the company behind it as basically duplicitous. For example, they still list Facebook as their first user of MongoDB on their website, for example, but there is no MongoDB use in Facebook hasn't been for years (it came in only via a startup acquisition).

I had the misfortune to use MongoDB at a previous job. The replication protocol wasn't atomic. You would find partial records that were never fixed in replicas. They claimed they fixed that in several releases, but never did. The right answer turned out to be to abandon MongoDB.

[+] kyllo|5 years ago|reply

I was floored by this comment yesterday from one of their Developer Relations people:

> Did any of you actually read the article? We are passing the Jepsen test suite and it was back in 2017 already. So, no, MongoDB is not losing anything if you know what you are doing.

https://twitter.com/MBeugnet/status/1253622755049734150?s=20

Can you imagine saying the phrase "if you know what you are doing," in public, to your users, as a DevRel person? Unbelievable.

[+] macintux|5 years ago|reply

The joke I learned early on: "Migrating away from Mongo is trivial: wait long enough, and all your data will be gone anyway."

I imagine things are better now.

[+] chx|5 years ago|reply

There was a time when I advocated for MongoDB with the usual caveats. The ability to easily store and index complex data was of great value. And then in 2015 October, within a week of each other, SQLite and MySQL both learned how to index on expressions and store JSON (SQLite 3.9 2015-10-14, MySQL 5.7 2015-10-21). PostgreSQL added jsonb the year prior in 9.4. At that moment the value of MongoDB for me diminished greatly.

[+] ashtonkem|5 years ago|reply

Every single time I've had to work on top of a Mongo cluster, it has gone into "three stooges" mode, where each node insists that one of the others is master.

I pretty much refuse to deploy a new instance of it now, I've been burned too often.

[+] kipply|5 years ago|reply

Tangentially related their sales strategy is questionable.

As an intern at Shopify, I got an email from MongoDB asking us to switch. Shopify was 10 years old the time. Plus several coworkers would also receive similar emails two years later (and some in between of course).

I have a shirt from MemSQL that says "Friends don't let friends NoSQL" and I wear it proudly.

[+] icedchai|5 years ago|reply

Isn't it amazing MongoDB is a 12 billion dollar company? Someone is using it and actually paying for it, even though it's not any of the developers you or I know.

[+] sneak|5 years ago|reply

I run in two circles: the one you mention, but also the other: I have gotten pushback from people (usually devs at clients of mine) for saying it’s lunacy to run a real, actual business on Mongo. (This has always happened from orgs with <10TB of data in the database.)

You’d be astounded how common it is at so-called “enterprise” startups. It blew my mind.

A lot of people simply never went through the LAMP stack days and have little/no experience with real databases like Postgres (or even MySQL). It’s disheartening.

[+] mathattack|5 years ago|reply

I have found their salespeople to be the most sleazy and unethical of any that I’ve worked with. Much worse than all the other database vendors combined.

[+] tehlike|5 years ago|reply

I got my pm friend prototype his idea on mean stack, but when we got more serious, immediately transitioned to postgres and started using sequelize as the orm. Pretty good decision so far. I don't think they will have cases that won't scale with orm for foreseeable future.

[+] OOPMan|5 years ago|reply

As a skateboarder I've always found the name itself rather amusing as the term mongo has relatively negative connotations in skating.

[+] polote|5 years ago|reply

Nobody seems to like it. Someone has any idea on why the company still has their revenue increasing ?

[+] dehrmann|5 years ago|reply

But their marketing team early on was amazing.

[+] naked-ferret|5 years ago|reply

From the jepsen report:

"""

Curiously, MongoDB omitted any mention of these findings in their MongoDB and Jepsen page. Instead, that page discusses only passing results, makes no mention of read or write concern, buries the actual report in a footnote, and goes on to claim:

> MongoDB offers among the strongest data consistency, correctness, and safety guarantees of any database available today.

We encourage MongoDB to report Jepsen findings in context: while MongoDB did appear to offer per-document linearizability and causal consistency with the strongest settings, it also failed to offer those properties in most configurations.

"""

This is a really professional to tell someone to stop their nonsense.

[+] mathattack|5 years ago|reply

Amazing that anyone can trust Mongo after this BS.

[+] Thaxll|5 years ago|reply

MySQL and PG are not truly consistent per default, they don't fsync every writes.

MongoDB explains that pretty well: https://www.mongodb.com/faq and https://docs.mongodb.com/manual/core/causal-consistency-read...

[+] foobarian|5 years ago|reply

From top of linked article:

>>> I have to admit raising an eyebrow when I saw that web page. In that report, MongoDB lost data and violated causal by default. Somehow that became "among the strongest data consistency, correctness, and safety guarantees of any database available today"! <<<

It's not wrong, just misleading. Seems overblown given that most practitioners know how to read this kind of marketing speak.

[+] thomascgalvin|5 years ago|reply

You can tell a lot about a developer by their preferred database.

* Mongo: I like things easy, even if easy is dangerous. I probably write Javascript exclusively

* MySQL: I don't like to rock the boat, and MySQL is available everywhere

* PostgreSQL: I'm not afraid of the command line

* H2: My company can't afford a database admin, so I embedded the database in our application (I have actually done this)

* SQLite: I'm either using SQLite as my app's file format, writing a smartphone app, or about to realize the difference between load-in-test and load-in-production

* RabbitMQ: I don't know what a database is

* Redis: I got tired of optimizing SQL queries

* Oracle: I'm being paid to sell you Oracle

[+] dang|5 years ago|reply

All: we've changed the submitted URL from https://www.infoq.com/news/2020/05/Jepsen-MongoDB-4-2-6 to the work it is reporting on. You might want to read both, since the infoq.com article does give a bit of background.

Edit: never mind, I think the other URL - http://jepsen.io/analyses/mongodb-4.2.6 - deserves a more technical thread, so will invite aphyr to repost it instead. It had a thread already (https://news.ycombinator.com/item?id=23191439) but despite getting a lot of upvotes, failed to make the front page (http://hnrankings.info/23191439/). I have no idea why—there were no moderation or other penalties on it. Sometimes HN's software produces weird effects as the firehose of content tries to make it through the tiny aperture of the frontpage.

[+] VonGuard|5 years ago|reply

Lying about your test results from Jepsen is like going onto a reality show with Chef Ramsey, being thrown off for incompetence, then putting his name on your restautant's ads "Chef Ramsey ate here!"

I'd pay to watch Kyle screaming at people in the MongoDB offices, not that he screams or anything. Just a spectacular mental image: "IT'S NOT ATOMIC! IT COULDN'T SERIALIZE A DOG'S DINNER!"

[+] jagannathtech|5 years ago|reply

I would watch a tech version of Ramsey's show.. oh boy!

[+] ncmncm|5 years ago|reply

MongoDB's big problem is that their present user base does not want the problems fixed, particularly at default settings, because it would mean going slower. Their users are self-selected as not caring much about integrity and durability. There are lots of applications where those qualities are just not very important, but speed is. People with such applications do need help with data management, and have money to spend on it.

The stock market wants to see the product as a competitor with Oracle, so demands all the certifications that say so. MongoDB marketing wants to be able to collect money as if the product were competitive. Many of the customers have management that would be embarrassed to spend that kind of money on a database that is not. And, ultimately, many of the applications do have durability requirements for some of the data.

So, MongoDB's engineers are pulled in one direction by actual (paying) users, and the opposite direction by the money people. It's not a good place to be. They have very competent engineers, but they have set themselves a problem that might not be solvable under their constraints, and that they might not be able to prove they have solved, if they did. Time spent on it does not address what most customers want to see progress on.

[+] gwbas1c|5 years ago|reply

Translation: They were trying to be everything for everybody.

The syntax is very nice, I honestly think a lot of it's early success came from ease of use.

[+] threeseed|5 years ago|reply

If they only cared about performance then they would've left the write concern defaults to not acknowledge writes either locally or within a replica set. Or just read from the nearest replica and don't worry about potential consistency issues.

Also this isn't 2011. MongoDB is not a competitor to Oracle and never really has been by people that knew that a DocumentDB was not usable as a SQL one. It's other SQL databases that are the real competitors e.g. Snowflake, Redshift are.

[+] jedberg|5 years ago|reply

MongoDB started life as a database designed for speed and ease of use over durability. That's not a good look for a database.

People have told me that they have since changed, but the evidence is overwhelmingly and repeatedly against them.

They seem to have been successful on marketing alone. Or people care more about speed and ease of use than durability, and my assumptions about what people want in a database are just wrong.

[+] otterley|5 years ago|reply

> MongoDB started life as a database designed for speed and ease of use over durability. That's not a good look for a database.

I think it depends. One could say the same about Redis, but it's wildly successful and people love it.

The difference is now they are advertised. Redis makes no claims to be anything other than what it is - a fast in-memory database that has some persistence capability but isn't meant to be a long-term data store. MongoDB, on the other hand, made (and continues to make) claims about being comparable in atomicity and durability to traditional SQL databases (but magically much faster!) that haven't withstood scrutiny.

Keep in mind, too, that most data ain't worth much. It's one thing to entrust data of low value in MongoDB; another to store mission-critical data in it. I would look askew at leadership who didn't ask hard questions about storing data worth millions or billions of dollars in MongoDB without frequent snapshots -- and even then, the value mustn't be contingent on the 100% accuracy of said data.

[+] Jare|5 years ago|reply

Reading past marketing blurbs and using products for the things they are designed is part of any engineer's job. I was irritated by MongoDB's claims and defaults, but that didn't stop us from putting it in production. We used it from 2012 to 2016 (their most infamous years?), and for our use cases, scale, size+expertise, and feature set, it was a perfect match. In our case, durability was a smaller concern by design (lots of write-only data, lots of ephemeral data), but we still configured it carefully and never ran into any data loss whatsoever; snapshots worked, migrations worked, etc.

If the service had lasted longer, scaled bigger, and the business it supported had been more successful, we might have ended up with a now-classic MongoDB to pg migration. That was always an acceptable outcome, and it would have not invalidated going with Mongo at the start.

[+] kiwicopple|5 years ago|reply

We[1] have done 50+ conversations with developers this year (mostly indie and small startups). You’re right about the ease of use. The top reasons are

  - they don’t know why, it was just the one they learned/heard about first
  - there is a lot of tooling for it

A lot of them even knew about the limitations of MongoDB but they still choose it.

We concluded that other databases need to start prioritising usability; something few developer tools usually care about.

[1] https//supabase.io

[+] collyw|5 years ago|reply

Maybe it's just because I know SQL reasonably well but I don't even find Mongo particularly easy to use. Not for complex queries anyway.

[+] thomascgalvin|5 years ago|reply

> Or people care more about speed and ease of use than durability

I think 90% of the Mongo installs I've been exposed to were set up by people that were tired of fighting with Hibernate configurations and schema migrations.

It's also popular among people whose definition of "legacy software" is "that app I stopped working on after three months because I have something shiny and new."

[+] cpuguy83|5 years ago|reply

I used it effectively to denormalize and combine some data from other services... sort of like a 2nd level, queryable cache. Worked very well for my needs. This was 7-8 yrs ago.

[+] gwbas1c|5 years ago|reply

I find with the MongoDB style of database, it's easy to prototype without needing to do the heavy schema management of SQL.

But, if you need a traditional ACID database, the flexibility comes with punch in the groin technical debt.

[+] speedgoose|5 years ago|reply

The Jepsen analysis : https://jepsen.io/analyses/mongodb-4.2.6

[+] erulabs|5 years ago|reply

I wonder if I'm the only sysadmin in the world who doesn't hate MongoDB. Yes, I wouldn't use it for new projects, and yes, I wish RethinkDB had taken its place, but it's not as horrible as people seem to think. Default configuration... If it weren't for RDS' doing PG-bouncer-style connection management, 95% of production postgres instances would probably fail. It innodb_buffer_pool_size wasn't set properly, plenty of data-centers would light on fire. If no one setup a firewall or AOF for redis, it's data-loss and data-exposure waiting to happen. If no one adds auth to an HTTP route, it's open to the world, etc etc etc. If tech-stacks were legos, software engineers would earn a heck of a lot less.

I absolutely agree it's been used by people who just don't want to write SQL queries, or being used as a text-search-engine in place of something like more appropriate like ElasticSearch, but to mock successful projects who were based on it seems silly. It reminds me of interviewing candidates at a startup who primarily used PHP/MySQL. Most of them openly laughed and called it all horrible. I voted "no" on them, and sometimes injected a somewhat toxic "ah, you're right - we should close up shop. Someone call Facebook - tell them their tech stack is horrible - shut it all down!".

You can learn a lot about a developer by asking "What do you think about Mongo, JavaScript, or PHP", and if their response isn't a shrug, they're probably more concerned with what editor is correct than if the product they're building is useful. It's an exceptional filter to reject zealots and find pragmatists.

All that said, MariaDB with MyRocks is _awesome_, but certainly not with the default settings :)

[+] Ice_cream_suit|5 years ago|reply

There is much amusement to be obtained from reading Jepsen's report:

"MongoDB’s default level of write concern was (and remains) acknowledgement by a single node, which means MongoDB may lose data by default.

...Similarly, MongoDB’s default level of read concern allows aborted reads: readers can observe state that is not fully committed, and could be discarded in the future. As the read isolation consistency docs note, “Read uncommitted is the default isolation level”.

We found that due to these weak defaults, MongoDB’s causal sessions did not preserve causal consistency by default: users needed to specify both write and read concern majority (or higher) to actually get causal consistency. MongoDB closed the issue, saying it was working as designed"

http://jepsen.io/analyses/mongodb-4.2.6

[+] crazybit|5 years ago|reply

MongoDB is horrible, I get it.

What do I use in this situation:

1) I need to store 100,000,000+ json files in a database

2) query the data in these json files

3) json files come from thousands upon thousands of different sources, each with their own drastically different "schema"

4) constantly adding more json files from constantly new sources

5) no time to figure out the schema prior to adding into the database

6) don't care if a json file is lost once in awhile

7) only 1 table, no relational tables needed

8) easy replication and sharding across servers sought after

9) don't actually require json, so long as data can be easily mapped from json to database format and back

10) can self host, no cloud only lock-in

Recommendations?

[+] NelsonMinar|5 years ago|reply

I think it's remarkable this report has been out for a week now and no one at MongoDB has commented on it. At least, not that I have seen.

[+] pengaru|5 years ago|reply

Maybe they're too busy spending their MDB money.

https://www.google.com/search?q=NASDAQ:+MDB

[+] seemslegit|5 years ago|reply

"We found that due to these weak defaults, MongoDB’s causal sessions did not preserve causal consistency by default: users needed to specify both write and read concern majority (or higher) to actually get causal consistency. MongoDB closed the issue, saying it was working as designed, and updated their isolation documentation to note that even though MongoDB offers “causal consistency in client sessions”, that guarantee does not hold unless users take care to use both read and write concern majority. A detailed table now shows the properties offered by weaker read and write concerns."

That sounds like a valid redress, or am I missing something ?

[+] arpa|5 years ago|reply

Oh, Jepsen and MongoDB again? Somebody get the popcorn!

[+] balfirevic|5 years ago|reply

Unfortunately, not an entertaining showdown - too one-sided.

[+] sacks2k|5 years ago|reply

I still remember when MongoDB was the new kid on the block and it was lauded as the only thing you should be using here on HN.

I'm glad my gut instinct was correct and that it really wasn't worth the hype. It reminds me of Ruby on Rails.

[+] nexuist|5 years ago|reply

I've never used RoR but I know people that still swear by it. It's outdated by today's "standards," but ActiveRecord was and is still a gem (heh) and a lot of RoR's foundational principles have been adopted by the existing major frameworks.

Regardless of technical acumen, I believe RoR doesn't deserve to be compared to Mongo for one reason: the RoR developers never tried to gaslight their users into thinking they're the reason everything broke; they never said only "if you know what you're doing" can you avoid these hidden pitfalls.

[+] veritas3241|5 years ago|reply

Every time I see a post about Mongo it makes me wonder what could have been if RethinkDB was managed differently.

[+] winrid|5 years ago|reply

I worked at one company where the network traffic just on the MongoDB master was around 2gb/s. We had machines with terrabytes of memory, and Mongo worked fine - until we had some replica set nightmares. Mongo support is amazing, but when replication breaks it's very hard to diagnose (usually it was our fault, but it felt very fragile).

[+] holoduke|5 years ago|reply

I used mongodb for 1 year for a milti million user app. I abondened it. The reliability and stability is just not good. I wanted it to be good, but it turned out to be a different

[+] Too|5 years ago|reply

Ok, so defaults suck, marketing is misleading, documentation and error messages are not exactly obvious. Assuming you are already stuck in the soup, putting those issues aside and getting practical instead instead of throwing more fire on the discussion:

If you set w: majority and r: linearizable/snapshot, both on collection, client and on transactions. Plus assuming you accept snapshot over Isolation. How bad are those remaining cases in reality and how do these issues compare to other databases? The final "read your future writes" error looks quite scary and does not seem to be caused by configuration error, same with "duplicate effects".

[+] eternalban|5 years ago|reply

"Informally, I would summarize the CAP theorem as: If the network is broken, your database won’t work."

- Dwight Merriman, former CEO, and "one of the original authors of MongoDB" [1]

A word to the wise suffices. Sometimes the word in question is implied by other words.

For those who get this oblique post, note that throwing the above bon mot in an interview session for a "distributed systems engineer" and asking for an opinion is a excellent way to differentiate between Peter Principle and Principal Engineer.

[1]: https://web.archive.org/web/20100903213540/http://blog.mongo...

[+] twoodfin|5 years ago|reply

Discussed previously:

https://news.ycombinator.com/item?id=23191439

[+] dang|5 years ago|reply

Surprisingly, it seems not to have made the front page: http://hnrankings.info/23191439/. There's clearly community appetite to discuss this, so we won't treat the current submission as a dupe.

399 comments