top | item 10554391

What MongoDB got right

144 points| reqres | 10 years ago |blog.nelhage.com | reply

110 comments

order
[+] s_kilk|10 years ago|reply
> Let's start with the simplest one. Making the developer interface to the database a structured format instead of a textual query language was a clear win.

I think this is the most significant factor, by far. With Mongo it's turtles (or at least Maps/Hashes) all the way down, without a strange pseudo-english layer near the bottom that forces you to translate back and forth. For some devs that's a big deal.

For the last while I've been experimenting with bringing the same feature to PostgreSQL (http://bedquiltdb.github.io), turns out it's very do-able, but I don't have enough time to make it as featureful as it needs to be.

[+] dbattaglia|10 years ago|reply
Maybe I'm weird and just like SQL but the query-as-data aspect of Mongo is actually my least favorite aspect of using it. Ad-hoc queries become torture, as do complicated queries. I might just be lucky to have used SQL Server and C# for so long which eases a lot of the pains you described.
[+] collyw|10 years ago|reply
SQL is still one of the most readable languages in my opinion. Its the one language where I find it easier to read queries than write them.
[+] anton_gogolev|10 years ago|reply
> Making the developer interface to the database a structured format instead of a textual query language was a clear win.

No, it was not. It is an abomiation, just like SharePoint CAML Query or any other "express-your-query-as-an-AST" approach. Sure, it's a paradise for Buzzword-Compliant Fluent Interface Enterprise Query Builder-type libraries.

> Querying SQL data involves constructing strings...

No. Case in point: LINQ. How it is implemented is irrelevant to the statement above.

> ...and counting arguments very carefully

No. Named parameters to the rescue.

[+] gedrap|10 years ago|reply
If you've been programming and using SQL for a few years, it's not a problem. It might take a bit of time to get the feeling of SQL, but it's ok later. And that's fine. rails-15min-blog things are cool and all, but some things have a learning curve and that's ok.

The vast vast majority of SQL in the wild is pretty straightforward and sucks only when the schema sucks (can't blame language for that). Most of the time people hate SQL, they do so because they are working with crap schema (whether someone else created it or they themselves didn't think it through enough). Sane schema + putting a bit of effort write readable SQL can go a long long way.

[+] yoklov|10 years ago|reply
Not a fan of mongo's query format/language, but I have to agree. It's always bugged me that so much work has gone into relational databases, and yet the only way anybody interfaces with them is by using an extremely quirky programming language from the 70's (or whenever).
[+] xlm1717|10 years ago|reply
For this dev, it's a big deal. I love writing a query in MongoDB way more than I love writing a query in MySQL.

Granted, I think a big part of it has to do with joins being the annoying part of SQL...

[+] ThePhysicist|10 years ago|reply
I actually built a MongoDB query interface for SQL via Python+SQLAlchemy. It allows you to query your relational database just like you would with MongoDB, and in addition use some things (e.g. Joins in queries or "deep querying") that are not possible using MongoDB.

Here's the repo:

https://github.com/adewes/blitzdb

The implementation is stable but I'm still working on finishing Python3 support and documentation.

[+] bryanlarsen|10 years ago|reply
"So while MongoDB today may not be a great database, I think there's a good chance that the MongoDB of 5 or 10 years from now truly will be."

Either MongoDB will be, or other databases that have learned the lessons, both good and bad, of MongoDB.

RethinkDB appears to have captured the "MongoDB done right" mindshare, and PostgreSQL has gained JSON and is gaining better replication in order to cover the same niches.

[+] infomofo|10 years ago|reply
I agree- I was a huge fan of MongoDB when it came out because of the unique data structures it enabled easily. However when it came time to select a new database for my new project, I found that the JSON support that PSQL had added gave me all the flexibility I needed while still in a somewhat relational form, and additionally it is dead simple to spin up postgres RDS instance in AWS, and it's a pain to use Mongo there.
[+] tracker1|10 years ago|reply
Agreed, and now that RethinkDB supports automagic failover, it's pretty much a no brainer... and while I like Mongo's query interface slightly more for most queries, RethinkDB avoids some of the weirdness when you have more interesting queries. And server-side collation/joins is a really nice feature in a document-centric database.

PostgreSQL really needs a MUCH better replication/sharding/failover story... While I would use PostgreSQL in a situation where all I need/want is a single server, where multiple servers are needed for HA/failover, I'd probably just defer to MS-SQL, only because pg is so convoluted in that regard.

As to MySQL/Maria... I haven't touched it in years, and every time I have some weird behavior drives me nuts. I find it funny that people can love mysql, and bash on JS.

I'd also like to acknowledge ElasticSearch and Cassandra... ES is wonderful to work with for what it does best, search, and C* is a champ when you need a really good distributed table/kv store, though I think that RethinkDB is a better option today, if you don't need more than 10-20 nodes (which is a LOT).

[+] Thaxll|10 years ago|reply
PostgreSQL is nowhere near Clustering / HA / sharding features. Afaik it's only a master / slave architecture by default.
[+] threeseed|10 years ago|reply
> RethinkDB appears to have captured the "MongoDB done right" mindshare

Mindshare is irrelevant. MongoDB is killing it in the enterprise right now. They have integration with Oracle, Teradata, Hadoop and countless partnerships with other vendors. You can guarantee MongoDB will still be around in 20 years the way it is positioning itself. Can't say the same about RethinkDB (as great as it is).

> PostgreSQL has gained JSON and is gaining better replication in order to cover the same niches

The PostgreSQL replication story is pretty pathetic given how old/mature it is. And I've seen nothing to suggest that anything is really improving in this area. There are a range of addons none of which are supported or built in. Basic replication is confusing, the documentation non existent in parts and good luck getting any support.

You compare it to MongoDB (or really any of the newer NoSQL databases) and it's like night and day. It takes minutes to setup a replica set and there is plenty of documentation and official support for any issues.

[+] yummyfajitas|10 years ago|reply
Counting arguments very carefully? Nearly every SQL library does this for you.

    cur.execute("INSERT INTO a (b,c) VALUES (%(a)s, %(b)s);",
        { 'a' : a, 'b' : b })
Also, SQL is typed, so even if you did fail to count arguments there is a good chance you'd just detect it the first time you ran it.

The article acts as if treating the DB like native structures is somehow innovative and new - it's not. https://en.wikipedia.org/wiki/Object_database

We mostly abandoned object databases because they sucked. SQL was a huge improvement over them. SQL is a great way to organize and preserve the integrity of a lot of business data.

It's also a fantastic way to avoid repeated trips to the DB:

    SELECT * FROM employees AS e 
        WHERE e.department_id = (SELECT id FROM departments WHERE name = "engineering");
In Mongo, I'm pretty sure you need to first lookup engineering, then lookup the employees in engineering. That could be O(# employees in engineering) queries rather than 1.
[+] acjohnson55|10 years ago|reply
> In Mongo, I'm pretty sure you need to first lookup engineering, then lookup the employees in engineering. That could be O(# employees in engineering) queries rather than 1.

Or, you could denormalize, and give yourself all sorts of future headaches maintaining data integrity.

[+] lloyd-christmas|10 years ago|reply
> In Mongo, I'm pretty sure you need to first lookup engineering, then lookup the employees in engineering. That could be O(# employees in engineering) queries rather than 1.

The problem with that summary boils down to bad architecture. The point of document storage is storage with purpose; the intent being to make querying EASIER. This could easily be structured to be a single query. You can structure a document countless ways to represent that query, all of them would likely be different based on the purpose of the app.

[+] krisdol|10 years ago|reply
I don't understand the recent backlash against NoSQL here.

First off, almost all of the complaints would have been valid years ago. Secondly, there is so much more choice out there today if mongodb wasn't the right answer for your project, and so many NoSQL stores have had time to mature and get polished APIs and docs.

We use various data stores for different purpose across microservices, mostly ES, couchbase, and datomic, and "use the right tool for the job" and "do one thing and do it well" feels like the right approach to take. For most applications, a SQL DB feels like a really big hammer that is put to a lot of things that don't look like nails.

[+] Cshelton|10 years ago|reply
Basically, NoSQL became the big trending thing, everyone was pushing it hard simply because other people were as well. Because of this, many people used a NoSQL database when either a.) They were using the wrong technology for the problem they needed to solve or b.) They have very little/no experience with databases and working with data in general and they really screwed themselves up. Then they took to the forums and went on NoSQL crusades.

Nothing is wrong with NoSQL, used correctly and for the right purpose, it is AMAZING.

[+] jpgvm|10 years ago|reply
"For most applications" is very very misleading.

Most applications believe it or not are business modelling problems, which are overwhelmingly relational. SQL was invented to solve these, so no surprise it is actually the best tool for the job by far.

[+] gedrap|10 years ago|reply
>>> use the right tool for the job" and "do one thing and do it well" feels like the right approach to take

Absolutely. However, database is a sort of an extreme example. A lot in the modern software (especially Web) relies on the database, and often migrating to completely different one (because requirements change and it might not be the right tool anymore) is a huge task. So you want to use something flexible enough.

Also, you want to hire people, people leave the jobs, people change teams, etc. If you use some exotic, less common DB, it adds a lot of overhead. And if you apply the "right tool" to an extreme and have a few completely different DBs flying around, your maintenance cost increases a lot.

See, SQL might not be a perfect, most elegant choice, but most often it is just good enough. A lot of people have used it, a lot of people have scaled it. If you run into an issue, often enough,other people did too and blogged about it, etc. Hiring / getting help will be much easier than $insertNoSQLDBName.

And, let's be realistic, relatively few companies have hundreds of gigabytes or terabytes of data that typical relational DBs can't handle.

My rule of thumb is that if you're in doubt, use SQL/relational store (I realize that they are different things but often used as synonyms and mean MySQL/PostgreSQL/etc).

[+] yummyfajitas|10 years ago|reply
The main problem MongoDB solves is "I don't want to learn SQL". The backlash is against this use case.

(This article certainly seems to be appealing to this use case, c.f. "counting arguments really carefully".)

[+] jeffdavis|10 years ago|reply
"Do one thing and do it well" is problematic for things that store a lot of data. Especially for things that are supposed to be an authoritative source.

Getting storage right is very hard. Either it's too low-level, and it's hard for applications to coordinate complex operations without corrupting data; or you end up putting a lot of features in and end up with a SQL dbms; or everything does its own storage and you have a mess.

[+] rwmj|10 years ago|reply
Just a note that in PG'OCaml (an OCaml interface to PostgreSQL), you can write:

    "insert into foo (col1,col2,col3) values ($a, $b, $c)"
and it creates the safe prepared statement with ? placeholders. At compile time. Type-checked against the database to make sure your program types match your column types.

http://pgocaml.forge.ocamlcore.org/

[+] annnnd|10 years ago|reply
I would be very careful with such SQL statements. I am guessing it relies on some intrinsic fields' order? That could change anytime. Order of fields shouldn't have any impact on you app, but I think in your case it does.
[+] ngrilly|10 years ago|reply
I agree that the three areas outlined in the article are things that MongoDB got right: a structured query language (instead of a textual query language), replica sets, and the oplog.

But the lack of transactions over multiple documents (in the same shard at least) and the lack of joins over multiple collections are a big showstopper for the kind of applications I develop.

I note that solutions like YouTube's Vitess provide something similar to MongoDB's replica sets.

I also note that PostgreSQL's logical decoding provide the same functionality than MongoDB's oplog tailing.

[+] s_kilk|10 years ago|reply
> a structured query language (instead of a textual query language)

Oh crowning irony of ironies, SQL literally means "Structured Query Language". :)

[+] progx|10 years ago|reply
Always wonder what kind of simple apps most people must write, if they not need joins?

I will be happy if i got such simple tasks :)

[+] bsg75|10 years ago|reply
> You can argue, and I would largely agree, that this is actually part of MongoDB's brilliant marketing strategy, of sacrificing engineering quality in order to get to market faster and build a hype machine, with the idea that the engineering will follow later.

Author nearly lost me here with this logic. Placing Marketing ahead of quality in something that is supposed to store a very valuable asset (data) is near insanity.

I get the mindset of "break fast", "release often", etc. in terms of customer facing features, but in something that is supposed to be a core part of your foundation, stability is if utmost importance. Otherwise nothing else works - and you lose customers, business, opportunities - because you can't look them up later.

Its not "brilliant marketing", its just marketing.

[+] smacktoward|10 years ago|reply
This is all true, but the success of MySQL shows pretty clearly that just because something is insane doesn't mean it's not good business.
[+] emilburzo|10 years ago|reply
I have to agree with the author, especially since the points he raises are the ones that helped me greatly on my first "serious" personal project[1].

Coming from postgresql land I would have never thought you can have such great replication with automatic failover. I've had literally 100% uptime for the past year.

And that's on commodity servers (one of them being in a room in my apartment, the other two in a proper datacenter) going through the usual upgrades, downtime, reboots, going from mongo 2 to mongo 3 and such.

Speaking of which, the migration from mongo2 to mongo3 was another pleasant surprise: they've made it backwards compatible. So I could do the upgrade on the servers, one by one, checking everything was ok and after that I could focus on updating the drivers and rewriting the deprecated queries, no need to have everything ready at once.

The accessible oplog was another gem that fit my project really well. Gone was the need to poll the database, I could just "watch" the oplog. That, coupled with long polling on the browser side meant I'd have very little chatter between the db/server/web client when idle. Websockets would have been nice, but adoption wasn't high enough that I'd be comfortable going forward with it.

And all this considering MongoDB was my first NoSQL experience.

I agree it doesn't fit every project, but when it does, it's a really nice experience.

[1] https://graticule.link/

[+] _yy|10 years ago|reply
RethinkDB took all the good parts of MongoDB and added proper engineering.

https://www.rethinkdb.com/

[+] ngrilly|10 years ago|reply
But still no transactions over multiple documents (at least in the same shard)?
[+] angelbob|10 years ago|reply
I love the point about the Oplog.

There are a few equivalents for common SQL DBs (see LinkedIn's Databus for Oracle and MySQL), but in general, getting access to the write log is really hard. Even though it's sitting there!

It would be wonderful if there were some kind of established API or library that would let you parse the MySQL write log without doing hideous, fragile operations that change from version to version. Sure, change the format, but at least version and document it!

[+] sriku|10 years ago|reply
When we chose MongoDB for a project, a dominant criterion was out of the box geo queries. It helped that the storage and query approach had good impedance match with NodeJS. From a query perspective, we wouldn't have benefited much from SQL anyway, since much of the reading is free text or social graph or location based search which we moved to Solr.