In my opinion, there is but one feature that a database really must have: whatever data I write into it, if I don't delete it, I want to read it back - unaltered (preferably without needing at least three machines, but I'm willing to compromise)
Software that cannot provide this single feature just isn't something I would call database.
If it's unsafe default configurations or just bugs. I don't care.
Between these two articles over the weekend and some from earlier, personally, I don't trust MongoDB to still have that feature and as such, it needs much more than one article with a strongly worded title to convince me otherwise.
The author would have us believe that it's unfair to pick on any piece of software because it "all sucks." They'd also have us believe that complaining about your data disappearing in MongoDB is an unfair criticism, and then takes the logical leap that judging the destruction of data and buggy software somehow has something to do with your own ability to create backups. Generally speaking the people who have been burned by MongoDB have survived by the fact that they had backups. This has nothing to do with the fact that their database nuked their data and that this is unacceptable if it happens due to careless engineering or poor defaults.
Edit: To be fair, if MongoDB was advertised as a "fault-intolerate, ephemeral database that is fast as heck but subject to failure and data loss, so do not put mission critical information in it" then all bets would be off. But we know that's never going to happen.
> Generally speaking the people who have been burned by MongoDB have survived by the fact that they had backups.
I'm not one of those people, MongoDB had silently corrupted half my data a week back without me noticing, so the backups were naturally missing half the data as well.
The difference between MongoDB and many of the other popular persistent data stores (relational or not) is one of degree, not of kind.
MongoDB isn't a fundamentally flawed system. It's just that the distance between what 10gen (and many of its defenders) claim and what it delivers is much greater than most other data storage systems. This is a subtle thing.
Many people have attempted to use MongoDB for serious, production applications. The first few times they encounter problems, they assume it's their fault and go RTFM, ask for help, and exercise their support contract if they're lucky enough to have one. Eventually it dawns on them that they shouldn't have to be jumping through these hoops, and that somewhere along the way they have been misled.
So it's not like anyone is misinterpreting the purpose and/or problem domain of MongoDB. It's more that they are exploring the available options, reading what's out there about MongoDB, and thinking, "Gosh, that sounds awfully cool. It fits what I'm trying to build, and it doesn't seem to have many obvious drawbacks. I think I'll give that a try." And then they get burned miles further down the road.
If MongoDB were presented as more of an experimental direction in rearranging the priorities for a persistent data store, then that would be fine. That's what it is, and that's great! We should have more of those. But when it's marketed by 10gen (and others) as a one-size-fits-all, this-should-be-the-new-default-for-everything drop-in replacement for relational databases, then it's going to fall short. Far short.
I hate to break it to the poster (and I would if they hadn’t chickened out and actually put their name on their post) but software has bugs.
This is not a valid excuse. This is like running a red light, smashing into someone, and then telling them "hey, you should have looked before entering the intersection... you should know that people sometimes run red lights".
Yes, you should have backups. No, that doesn't make data-loss bugs any more excusable.
The anon poster claimed to have deployed an early version of Mongo, at a "high profile" company with tens of millions of users, and yet seemed surprised by basic RTFM facts like 'must use getLastError after calls if you need to ensure your write was taken', even well into a production deploy. That should raise huge alarm bells for anyone who is considering taking the guy seriously.
It's just not clear that there were bona-fide 'data-loss bugs' in play here. Seems at least as likely that misuse and misunderstanding of Mongo led to data-loss that could have been avoided.
So, I'd revise your simile. This is more like ignoring a lot of perfectly safe roads which lead to where you're trying to go, instead choosing to chance a more exciting looking shortcut filled with lava pits and dinosaurs. And putting on a blindfold before driving on to it.
Look, NoSQL is wild and wooly and full of tradeoffs, that's a truism by now. If you use such tech without thoroughly understanding it, and consequently run your company's data off a cliff, absolutely it's on you. Mongo does not have a responsibility to put training wheels on and save naive users from themselves, because there should not be naive users. These are data stores, the center of gravity for an application or a business. People involved in choosing and deploying them should not be whinging about default settings being dangerous, about not getting write confirmations when they didn't ask for write confirmations, etc. There's just no excuse for relying blindly upon default settings. Reading the manual on such tech is not optional. Those who don't and run into problems, well, they'd be well-advised to chalk it up as a learning experience and do better next time. Posting "ZOMG X SUCKS BECAUSE I BURNED MYSELF WITH IT" is just silly, reactionary stuff, and it depresses me that HN falls for it and upvotes it like it's worth a damn, every freaking time.
Mongo is fine until it's not. It's been fine for us for many months, but once you hit its limitations, it's pretty horrible. We're in this situation right now and we're seriously considering moving back to MySQL or Postgres.
Basically, "it doesn't scale" unless you throw tons of machines/shards at it.
Once they fix a few of their main issues such as the global write lock and fix many of the bugs, it could become an outstanding piece of software. Until then, I consider it as not ready for production use in a write-intensive application. Knowing what I know now, I certainly would not have switched our data to MongoDB.
Have you considered Riak? (I ask mostly because I've been looking at both, having a little MongoDB experience but having heard great things about Riak.)
My experience is that this is true of every database system (relational or non-). The thing is that they all break in different ways at different points, and so the smart thing to do is make choices based on that information.
The stupid thing to do is write blog posts about how Software Package X sucks and nobody should use it for anything.
I'm not sure how hosting a video for Gawker builds a lot of credibility for having field tested a database; perhaps there are more details he can provide about how that is a particularly interesting trial for a database. Among other things, that seems like "lots and lots of reads, very few writes" and a "very consistent access pattern regardless" kind of situation.
Ha! Fair point. I thought it was an interesting trial in that all updates to our user data wound up being published into MongoDB. All the other tools we'd tried for this purpose, CouchDB, MySQL with both MyISAM and InnoDB and even "thousands of .js files in a hashed directory structure" didn't perform as well. It allowed us to shift the load from our MySQL database to "something else" as during our spikes we were getting killed. It was a read-heavy workload in that case.
The thing that struck me about the original post was how it seemed some of the complaints were just normal things that people learn when dealing with clusters under load. "Adding a shard under heavy load is a nightmare." Well, I mean, duh. If you add a shard and the cluster has to rebalance, you're adding load. It's like how you're more likely to get a disk failure during a RAID rebuild. The correct time to add a shard is during the off hours.
Can somebody recommend a database with an API like Mongo's, but performance and durability more like Postgresql or Oracle's?
What I want to do is throw semi-structured JSON data into a database, and define indexes on a few columns that I'd like to do equality and ranged queries on. Mongo seems ideal for this, but I don't needs its performance, and want durability and the ability to run the odd query which covers more data than fits into RAM, without completely falling over.
Right now, the alternative is to do something like the following in Postgres, and have the application code extract a few things from the JSON and duplicate them into database columns when I insert data.
CREATE TABLE collected_data(
source_node_id TEXT NOT NULL,
timestamp INTEGER NOT NULL,
json_data TEXT);
CREATE INDEX collected_data_idx ON collected_data(source_node_id, timestamp);
CouchDB fits the bill. It's all about documents, persistence and defining indexes for range-queries.
Keep in mind two things:
1. It's all on the disk. So, while its throughput is excellent (thousands or more requests per second), each individual request has a latency of ~10ms
2. You define your indexes beforehand (called 'views' in couch terminology), and then you can only make simple queries on them - like by a specific key or by a range of keys. It takes some learning.
If you and your app are ok with both, go for Couch.
First, I would take some of the discussion about MongoDB loosing data with a grain of salt - especially since the really harsh critique is coming from an unknown source, as far as we know, it could be a sinister competitor spreading BS. MongoDB makes the up front choice of performance and scalability over consistency, they have never pretended that was not the case. That does not mean that MongoDB looses data left and right, it means that in the choice between bringing a production app to a halt and loosing data, MongoDB will opt to keep your app running.
Second, and this is a shameless plug, I really believe that the DB I work for (Neo4j) is a good answer to your question. Neo4j makes the same consistency vs. uptime decision that classic RDBMes do. In the choice between bringing a production app to a halt and loosing data, Neo4j will opt for saving the data.
So, to answer your question: Neo4j lets you store semi-structured documents in a manner somewhat similar to MongoDB, with the added comfort of full ACID compliance. It also lets you specify indexes to do exactly what you describe.
A PostgreSQL-specific alternative might be to write triggers in one of the provided procedural languages to turn your JSON into something indexed or materialized elsewhere.
Do either of those work for you?
Also, purely out of curiosity, do you have a design reason for only wanting to store schema-less JSON, or have you just been burned by slow database migrations in the past?
There seems to be a big community of people who really want to reject schema and use JSON for everything, and I'm really curious if they (a) don't understand relational databases, (b) are getting some surprising productivity gains somehow, (c) have been burned by slow database migrations in the past, or (d) some other reason.
Anecdotes like the one from the article that ended with "The one thing that didn’t flinch was MongoDB" don't convince me one bit. When something else between the end user is a bottleneck it would be silly to assume that is the only problem in the entire system. Who is to say that if the load balancers hadn't been configured differently, or higher spec'd that their MongoDB wouldn't have become a smoldering crater?
While anecdotal evidence is always suspect, remember that the case in the article is MongoDB's optimal use case. That being of being extremely read heavy (There is no indication that they did more than one write that day).
"First of all, with any piece of technology, you should, y’know, RTFM. EVERY company out there selling software solutions is going to exaggerate how awesome it is."
Ah, but it isn't the company that's exaggerating the wonders of MongoDB...
remind me on the arguments of which programming language is the best. There is no best technology. Different technologies are designed for different engineering problems. You can't really blame the technology when you are choosing the wrong tools for your problem.
If you're going to choose one of these high performance NoSQL DBs you are trading ACID for that performance. How hard is this to understand guys? If that doesn't suit your purposes, don't use it.
1) No one said we are trading ALL of ACID. D should never be traded, period, except for transient data or cache.
2) We don't even get the performance guarantee. See the pastebin post about how the write lock affects performance and how synchronization with a slave can go awry.
[+] [-] pilif|14 years ago|reply
Software that cannot provide this single feature just isn't something I would call database.
If it's unsafe default configurations or just bugs. I don't care.
Between these two articles over the weekend and some from earlier, personally, I don't trust MongoDB to still have that feature and as such, it needs much more than one article with a strongly worded title to convince me otherwise.
[+] [-] smacktoward|14 years ago|reply
http://www.youtube.com/watch?v=b2F-DItXtZs
[+] [-] Djehngo|14 years ago|reply
http://news.ycombinator.com/item?id=3205573
[+] [-] adabsurdo|14 years ago|reply
if you don't have such redundancy your data is not safe, no matter what database engine you are running.
[+] [-] gfodor|14 years ago|reply
Edit: To be fair, if MongoDB was advertised as a "fault-intolerate, ephemeral database that is fast as heck but subject to failure and data loss, so do not put mission critical information in it" then all bets would be off. But we know that's never going to happen.
[+] [-] StavrosK|14 years ago|reply
I'm not one of those people, MongoDB had silently corrupted half my data a week back without me noticing, so the backups were naturally missing half the data as well.
[+] [-] bluemoon|14 years ago|reply
[+] [-] cap10morgan|14 years ago|reply
MongoDB isn't a fundamentally flawed system. It's just that the distance between what 10gen (and many of its defenders) claim and what it delivers is much greater than most other data storage systems. This is a subtle thing.
Many people have attempted to use MongoDB for serious, production applications. The first few times they encounter problems, they assume it's their fault and go RTFM, ask for help, and exercise their support contract if they're lucky enough to have one. Eventually it dawns on them that they shouldn't have to be jumping through these hoops, and that somewhere along the way they have been misled.
So it's not like anyone is misinterpreting the purpose and/or problem domain of MongoDB. It's more that they are exploring the available options, reading what's out there about MongoDB, and thinking, "Gosh, that sounds awfully cool. It fits what I'm trying to build, and it doesn't seem to have many obvious drawbacks. I think I'll give that a try." And then they get burned miles further down the road.
If MongoDB were presented as more of an experimental direction in rearranging the priorities for a persistent data store, then that would be fine. That's what it is, and that's great! We should have more of those. But when it's marketed by 10gen (and others) as a one-size-fits-all, this-should-be-the-new-default-for-everything drop-in replacement for relational databases, then it's going to fall short. Far short.
[+] [-] cperciva|14 years ago|reply
This is not a valid excuse. This is like running a red light, smashing into someone, and then telling them "hey, you should have looked before entering the intersection... you should know that people sometimes run red lights".
Yes, you should have backups. No, that doesn't make data-loss bugs any more excusable.
[+] [-] nullymcnull|14 years ago|reply
It's just not clear that there were bona-fide 'data-loss bugs' in play here. Seems at least as likely that misuse and misunderstanding of Mongo led to data-loss that could have been avoided.
So, I'd revise your simile. This is more like ignoring a lot of perfectly safe roads which lead to where you're trying to go, instead choosing to chance a more exciting looking shortcut filled with lava pits and dinosaurs. And putting on a blindfold before driving on to it.
Look, NoSQL is wild and wooly and full of tradeoffs, that's a truism by now. If you use such tech without thoroughly understanding it, and consequently run your company's data off a cliff, absolutely it's on you. Mongo does not have a responsibility to put training wheels on and save naive users from themselves, because there should not be naive users. These are data stores, the center of gravity for an application or a business. People involved in choosing and deploying them should not be whinging about default settings being dangerous, about not getting write confirmations when they didn't ask for write confirmations, etc. There's just no excuse for relying blindly upon default settings. Reading the manual on such tech is not optional. Those who don't and run into problems, well, they'd be well-advised to chalk it up as a learning experience and do better next time. Posting "ZOMG X SUCKS BECAUSE I BURNED MYSELF WITH IT" is just silly, reactionary stuff, and it depresses me that HN falls for it and upvotes it like it's worth a damn, every freaking time.
[+] [-] cmer|14 years ago|reply
Basically, "it doesn't scale" unless you throw tons of machines/shards at it.
Once they fix a few of their main issues such as the global write lock and fix many of the bugs, it could become an outstanding piece of software. Until then, I consider it as not ready for production use in a write-intensive application. Knowing what I know now, I certainly would not have switched our data to MongoDB.
[+] [-] eropple|14 years ago|reply
[+] [-] ubernostrum|14 years ago|reply
My experience is that this is true of every database system (relational or non-). The thing is that they all break in different ways at different points, and so the smart thing to do is make choices based on that information.
The stupid thing to do is write blog posts about how Software Package X sucks and nobody should use it for anything.
[+] [-] tptacek|14 years ago|reply
[+] [-] slyphon|14 years ago|reply
The thing that struck me about the original post was how it seemed some of the complaints were just normal things that people learn when dealing with clusters under load. "Adding a shard under heavy load is a nightmare." Well, I mean, duh. If you add a shard and the cluster has to rebalance, you're adding load. It's like how you're more likely to get a disk failure during a RAID rebuild. The correct time to add a shard is during the off hours.
[+] [-] sqrt17|14 years ago|reply
[+] [-] tomlin|14 years ago|reply
[+] [-] zobzu|14 years ago|reply
6th grade all over again.
[+] [-] andrewf|14 years ago|reply
What I want to do is throw semi-structured JSON data into a database, and define indexes on a few columns that I'd like to do equality and ranged queries on. Mongo seems ideal for this, but I don't needs its performance, and want durability and the ability to run the odd query which covers more data than fits into RAM, without completely falling over.
Right now, the alternative is to do something like the following in Postgres, and have the application code extract a few things from the JSON and duplicate them into database columns when I insert data.
[+] [-] sep|14 years ago|reply
1. It's all on the disk. So, while its throughput is excellent (thousands or more requests per second), each individual request has a latency of ~10ms
2. You define your indexes beforehand (called 'views' in couch terminology), and then you can only make simple queries on them - like by a specific key or by a range of keys. It takes some learning.
If you and your app are ok with both, go for Couch.
[+] [-] jakewins|14 years ago|reply
Second, and this is a shameless plug, I really believe that the DB I work for (Neo4j) is a good answer to your question. Neo4j makes the same consistency vs. uptime decision that classic RDBMes do. In the choice between bringing a production app to a halt and loosing data, Neo4j will opt for saving the data.
So, to answer your question: Neo4j lets you store semi-structured documents in a manner somewhat similar to MongoDB, with the added comfort of full ACID compliance. It also lets you specify indexes to do exactly what you describe.
[+] [-] socratic|14 years ago|reply
http://news.ycombinator.com/item?id=496946
A PostgreSQL-specific alternative might be to write triggers in one of the provided procedural languages to turn your JSON into something indexed or materialized elsewhere.
Do either of those work for you?
Also, purely out of curiosity, do you have a design reason for only wanting to store schema-less JSON, or have you just been burned by slow database migrations in the past?
There seems to be a big community of people who really want to reject schema and use JSON for everything, and I'm really curious if they (a) don't understand relational databases, (b) are getting some surprising productivity gains somehow, (c) have been burned by slow database migrations in the past, or (d) some other reason.
[+] [-] josephcooney|14 years ago|reply
[+] [-] spectre|14 years ago|reply
[+] [-] KaeseEs|14 years ago|reply
Ah, but it isn't the company that's exaggerating the wonders of MongoDB...
[+] [-] bigfun|14 years ago|reply
[+] [-] xyan2284|14 years ago|reply
[+] [-] dextorious|14 years ago|reply
We should never forget that, too. Even in carpentry, there are badly made hammers.
[+] [-] mambodog|14 years ago|reply
[+] [-] dextorious|14 years ago|reply
2) We don't even get the performance guarantee. See the pastebin post about how the write lock affects performance and how synchronization with a slave can go awry.
[+] [-] bitemyapp|14 years ago|reply
[deleted]