Seriously, another case of using Mongo incorrectly? I want to believe all the Mongo hate, but I can't because I always find out that the actual problem was one or more of:
* didn't read the manual
* poor schema
* didn't maintain the database (compactions, etc.)
In this case, they hit several:
" Its volume on disk is growing 3-4 times faster than the real volume of data it store;"
They should be doing compactions and are not. Using PostgreSQL does not avoid administration; it simply changes the administration to be done.
"it eats up all the memory without the possibility to limit this"
That's the idea -- that memory isn't actually used though; it's just memory mapping the file. It will swap out for something else that needs the space unless you are actively using all the data, in which case you really are using all your memory. Which is why you should put it on its own server...
"it begins to slow down the application because of frequent disk access"
"Finally we sleep quietly, and don’t fear that mongodb will drive out redis to swap once again."
You should be running Mongo on a server by itself. At the very least, if you're having disk contention issues, don't run it on the same server as your other database.
I'm not sure you always need to read the manual for everything, but for your production database, it's probably worth it.
> Seriously, another case of using Mongo incorrectly?
If a large proportion of MongoDB users are using it incorrectly, then I'd argue that it is a MongoDB problem, if only a documentation and messaging one. Clarity on what is and is not an appropriate use should be prominent.
> * didn't read the manual
> * poor schema
> * didn't maintain the database (compactions, etc.)
The real world dictates that this happens more often than not. You know why I like Postgres? When I don't read the manual, create a crappy schema, and forgot to maintain the database it STILL seems to work okay.
In all fairness, the compaction is a major pain in Mongo. I get a little worked up about this because I cant think of another database that handles compaction this poorly, but feel free to correct me if Im wrong.
> Seriously, another case of using Mongo incorrectly? I want to believe all the Mongo hate, but I can't because I always find out that the actual problem was one or more of:
> * poor schema
You're right. If people read the awesome mongodb docs before using it, they'd figure out that mongodb's ideal, good for performance schema has limitations that doesn't fit with a lot of projects. Of course this may have changed since mongodb evolves pretty quickly.
> "Finally we sleep quietly, and don’t fear that mongodb will drive out redis to swap once again."
MongoDB and Redis on the same box? Two data stores that need working set / all of the data to reside in RAM for performance? That is a recipe bound for failure.
Everyone seems to learn about physical separation the hard way.
For what it is worth, I would think people actually try different things in the existing setup before they decide on doing a switch like this. It is not easy to pull off at all. My guess would be that if you have way more Postgres knowledge in the house, then it is more sensible to run Postgres.
This also drives the amount of administrative overhead needed.
The current stable node drivers silently throws away exceptions. Seriously, mongodb inc acknowledge it. Is this also a case of not using mongo correctly?
Mongo's disk format is extremely wasteful, the database files are gigantic. That is a real problem and there is no way to compact this to anywhere near the size something like Postgres would have for the same data.
Mongo is very bad at managing used memory. In fact it doesn't actually manage memory since it just mmaps its database file.
It also touches disk much more often than would be reasonable, especially for how much memory it uses.
It's a terrible database and it is perfectly legitimate to be annoyed at it being this terrible.
Maybe I am just incredibly lucky, but mongodb has worked fine for ridewithgps.com - we are sitting at 670gb of data in mongo (actual DB size, indexes included) and haven't had a problem. Replica sets have been fantastic, I wish there was another DB out there that did auto-failover as cleanly/easily as mongo does. We've had a few server crashes of our primary, and aside from 1-2 seconds or so of errors as requests come in before the secondary is promoted, it's transparent.
With that being said, we are using it to store our JSON geo track data, most everything else is in a mysql database. As a result we haven't run into limitations around the storage/query model that some other people might be experiencing.
Additionally, we have some serious DB servers so haven't felt the pain of performance when exceeding working memory. 192gb of ram with 8 RAID10 512gb SSDs probably masks performance issues that other people are feeling.
Final note: I'll probably be walking away from mongo, due to the natural evolution of our stack. We'll store high fidelity track data as gzipped flat files of JSON, and a reduced track inside of postgis.
tl;dr - using mongo as a very simple key/value store for data that isn't updated frequently, which could easily be replaced by flat file storage, is painless. YMMV with other use cases.
670 gigabytes is a puny database size. You should be able to press so much power through a system with a disk system like the one you have. I would seriously consider a Postgres setup on a data set of that size. Additionally, I would probably just store the JSON data directly inside postgresql.
> using mongo as a very simple key/value store for data that isn't updated frequently, which could easily be replaced by flat file storage, is painless. YMMV with other use cases.
This is our use case as well and MongoDB has been fine. We had some initial pain as we learned the product but it's great for this use case. Currently sitting around 1TB of data.
The title is a bit misleading. This is basically an announcement of a fork of Errbit that has Postgres support. Additionally, the fork was announced as an issue on errbit with no discussion or as an official pull request.
I would not consider this good etiquette. If you fork your project (especially without discussing the intention first), adding a bug to the original project isn't a very nice thing to do.
An official pull request would be nicer or, even better, don't bother the original project, but just announce your fork over other channels.
Even better would be to at least discuss the issue with the original project - maybe they agree and you can work together.
>even better, don't bother the original project, but just announce your fork over other channels
This is a rather bizarre interpretation of nice behavior: Make a very cool modification to a project, but don't even bother to tell the original maintainers/authors?
Github Issues is a perfectly reasonable place for this. Maybe the mailing list would be better, but, shrug. Issues != Bugs, by the way. There's a reason it's called Issues. And it's basically the only way to have a discussion on github about anything whether it's an issue or not.
Also, some maintainers get mad if you send a pull request without doing an issue first, so there's no right way.
I disagree, I think that opening an issue on github is a good way to start a discussion about a feature. Many projects accept feature requests this way and if anyone did the same for one of my projects, this would be the way I would prefer them to handle it.
If you go back and look at it now, you'll see that this is a non-issue:
@realmyst Will you put up a Pull Request?
It sounds like MongoDB has no future indeed:
http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/
realmyst commented 19 minutes ago
@21croissants yes, I will.
Lets just say that PostgreSQL answers the criticisms of relational databases that led to NoSQL. The complaints all boiled down to saying that the RDBMS forced you to do things one way and that it was cumbersome. PostgreSQL evolved and fixed the most annoying issues like JSON support and schemaless key-value store support. That's the way open source is supposed to work. Now folks are learning that throwing out the baby with the bathwater leads to more complexity than just learning how to use a relational database. The pendulum has swung back.
This story has played out before. Last time, it was Object Oriented databases. What happens each time is that the traditional RDBMS's pick up a few of the features, and then we keep using them until the next contender comes along.
This is not true at all. The actual realization the past years is that strictly enforced relationality (is that a word?) and transactions are constructs that are not always or even rarely actually needed. Eventual consistency, schemaless data modelling and so on picked up steam and for good reasons. Every technology that survives the "Oh, new toy!" stage has a place or it wouldn't still exist. It is up to developers to choose the appropriate technology for them and their projects. That isn't to say that a lot of persistence problems cannot be solved by an RDBMS, a k/v store and a document store. In that case just base your decision on other drivers (comfort level, cost, and so on)
Right. Just like MySQL/Oracle was put out to pasture.
And if you think MongoDB is only popular because it is a JSON store then it shows just little you know about the database landscape and about how developers actually use databases.
All the philosophical issues and /(No)?SQL/ discussions aside, as a heavy user of Postgres and a user of Errbit, this is very good news to me. I have not much experience with running Mongo, but I have a ton of experience with running Postgres.
Even better: The application I'm using Errbit the most for is already running in front of a nicely replicated and immensely powerful postgres install.
Being able to put the Errbit data there is amazing.
If you want to use MongoDB in a project and you don't intend to rely heavily on the aggregation framework, the consider TokuMX (http://www.tokutek.com/products/tokumx-for-mongodb/) as it alleviates many of the shortcomings of MongoDB (data compression, document level locking for writes, ...) + it adds transactions.
It's a drop in replacement so it will work with current drivers. (if you have a running mongo cluster however expect quite some work if you want to migrate)
(I have no affiliation with TokuTek whatsoever except that I use their product)
I'm no MongoDB expert, but recently started to look into this db. Can anyone tell me (from experience, not from promo materials) - for which use cases MongoDB is good fit and for which ones it's not? It's clear that it can't fit for everyone. That's why it would be good to know in advance, for what it most likely to find and for what it's most likely not to fit.
> Finally we sleep quietly, and don’t fear that mongodb will drive out redis to swap once again.
Well duh, Mongo was designed to live on its own server as it tries to claim all of the free memory available. Putting it on the same server with Redis makes no sense.
The case that caused you sleepless nights does not apply to 99% of projects out there.
In case you missed it, this submission is not about PostgreSQL vs MongoDB. It's about the crazy GIF parade in the comments interleaved with thumbs up emojis. You don't see such stuff often on github :)
This just makes me wonder why they chose Mongo in the first place. It sounds like they didn't really consider their needs when initially choosing databases. Mongo has some benefits that when properly implemented far outweigh the negatives. At the same time, it's still relatively young, and doesn't have the "maturity of process" that makes older SQL engines so easy to manage/implement. Eventually, I'm sure, Mongo will solve these issues and be a great database for those who need to utilize its many virtues.
Does anyone have a recommendation for an authoritative guide to either Postgres or Mongodb? One that does more than show you where the levers are, that is.
[+] [-] functional_test|12 years ago|reply
* didn't read the manual
* poor schema
* didn't maintain the database (compactions, etc.)
In this case, they hit several:
" Its volume on disk is growing 3-4 times faster than the real volume of data it store;"
They should be doing compactions and are not. Using PostgreSQL does not avoid administration; it simply changes the administration to be done.
"it eats up all the memory without the possibility to limit this"
That's the idea -- that memory isn't actually used though; it's just memory mapping the file. It will swap out for something else that needs the space unless you are actively using all the data, in which case you really are using all your memory. Which is why you should put it on its own server...
"it begins to slow down the application because of frequent disk access"
"Finally we sleep quietly, and don’t fear that mongodb will drive out redis to swap once again."
You should be running Mongo on a server by itself. At the very least, if you're having disk contention issues, don't run it on the same server as your other database.
I'm not sure you always need to read the manual for everything, but for your production database, it's probably worth it.
[+] [-] rlpb|12 years ago|reply
If a large proportion of MongoDB users are using it incorrectly, then I'd argue that it is a MongoDB problem, if only a documentation and messaging one. Clarity on what is and is not an appropriate use should be prominent.
So, what is this proportion?
[+] [-] jtchang|12 years ago|reply
The real world dictates that this happens more often than not. You know why I like Postgres? When I don't read the manual, create a crappy schema, and forgot to maintain the database it STILL seems to work okay.
[+] [-] jeffdavis|12 years ago|reply
https://jira.mongodb.org/browse/SERVER-11763
It looks like compaction is an offline process. That really puts the user between a rock and a hard place.
[+] [-] dev360|12 years ago|reply
[+] [-] rdtsc|12 years ago|reply
If everyone uses Mongo incorrectly, the problem is not Mongo. It is like the person crying out how everyone in the world is crazy.
[+] [-] chaostheory|12 years ago|reply
You're right. If people read the awesome mongodb docs before using it, they'd figure out that mongodb's ideal, good for performance schema has limitations that doesn't fit with a lot of projects. Of course this may have changed since mongodb evolves pretty quickly.
[+] [-] bsg75|12 years ago|reply
MongoDB and Redis on the same box? Two data stores that need working set / all of the data to reside in RAM for performance? That is a recipe bound for failure.
Everyone seems to learn about physical separation the hard way.
[+] [-] blablabla123|12 years ago|reply
[+] [-] jb007|12 years ago|reply
[+] [-] jlouis|12 years ago|reply
This also drives the amount of administrative overhead needed.
[+] [-] nailer|12 years ago|reply
[+] [-] lucian1900|12 years ago|reply
Mongo is very bad at managing used memory. In fact it doesn't actually manage memory since it just mmaps its database file.
It also touches disk much more often than would be reasonable, especially for how much memory it uses.
It's a terrible database and it is perfectly legitimate to be annoyed at it being this terrible.
[+] [-] cullenking|12 years ago|reply
With that being said, we are using it to store our JSON geo track data, most everything else is in a mysql database. As a result we haven't run into limitations around the storage/query model that some other people might be experiencing.
Additionally, we have some serious DB servers so haven't felt the pain of performance when exceeding working memory. 192gb of ram with 8 RAID10 512gb SSDs probably masks performance issues that other people are feeling.
Final note: I'll probably be walking away from mongo, due to the natural evolution of our stack. We'll store high fidelity track data as gzipped flat files of JSON, and a reduced track inside of postgis.
tl;dr - using mongo as a very simple key/value store for data that isn't updated frequently, which could easily be replaced by flat file storage, is painless. YMMV with other use cases.
[+] [-] rbranson|12 years ago|reply
[+] [-] jlouis|12 years ago|reply
[+] [-] blablabla123|12 years ago|reply
How often did you update your data then? In my current project I am seeing locking issues in my way soon...
[+] [-] mason55|12 years ago|reply
This is our use case as well and MongoDB has been fine. We had some initial pain as we learned the product but it's great for this use case. Currently sitting around 1TB of data.
[+] [-] keypusher|12 years ago|reply
I would hope so.
[+] [-] 3pt14159|12 years ago|reply
[+] [-] pilif|12 years ago|reply
I would not consider this good etiquette. If you fork your project (especially without discussing the intention first), adding a bug to the original project isn't a very nice thing to do.
An official pull request would be nicer or, even better, don't bother the original project, but just announce your fork over other channels.
Even better would be to at least discuss the issue with the original project - maybe they agree and you can work together.
[+] [-] mapgrep|12 years ago|reply
This is a rather bizarre interpretation of nice behavior: Make a very cool modification to a project, but don't even bother to tell the original maintainers/authors?
Github Issues is a perfectly reasonable place for this. Maybe the mailing list would be better, but, shrug. Issues != Bugs, by the way. There's a reason it's called Issues. And it's basically the only way to have a discussion on github about anything whether it's an issue or not.
Also, some maintainers get mad if you send a pull request without doing an issue first, so there's no right way.
[+] [-] spellboots|12 years ago|reply
[+] [-] vdaniuk|12 years ago|reply
[+] [-] ricardobeat|12 years ago|reply
[+] [-] gizzlon|12 years ago|reply
[+] [-] memracom|12 years ago|reply
[+] [-] yid|12 years ago|reply
As I recall, automatic sharding was on that list, and pg doesn't attempt to tackle that afaik.
[+] [-] frezik|12 years ago|reply
[+] [-] remon|12 years ago|reply
[+] [-] threeseed|12 years ago|reply
It is still cumbersome to use, hard to shard, even harder to cluster and is incredibly complex to manage compared to databases like Cassandra.
[+] [-] mml|12 years ago|reply
"Mongodb" already nearly exists as a single column type, 9.4 will complete it.
[+] [-] threeseed|12 years ago|reply
And if you think MongoDB is only popular because it is a JSON store then it shows just little you know about the database landscape and about how developers actually use databases.
[+] [-] pilif|12 years ago|reply
Even better: The application I'm using Errbit the most for is already running in front of a nicely replicated and immensely powerful postgres install.
Being able to put the Errbit data there is amazing.
This is some of the best news I've read today :-)
[+] [-] jvvlimme|12 years ago|reply
It's a drop in replacement so it will work with current drivers. (if you have a running mongo cluster however expect quite some work if you want to migrate)
(I have no affiliation with TokuTek whatsoever except that I use their product)
[+] [-] stonewhite|12 years ago|reply
[+] [-] endijs|12 years ago|reply
[+] [-] kldavenport|12 years ago|reply
[+] [-] egeozcan|12 years ago|reply
[+] [-] weixiyen|12 years ago|reply
Well duh, Mongo was designed to live on its own server as it tries to claim all of the free memory available. Putting it on the same server with Redis makes no sense.
The case that caused you sleepless nights does not apply to 99% of projects out there.
[+] [-] WoodenChair|12 years ago|reply
[+] [-] r0muald|12 years ago|reply
[+] [-] trekky1700|12 years ago|reply
[+] [-] unknown|12 years ago|reply
[deleted]
[+] [-] jeffdavis|12 years ago|reply
Are they saying that it has a high constant overhead to the data, or are they saying the storage grows in a super-linear fashion?
[+] [-] poseid|12 years ago|reply
[+] [-] iand|12 years ago|reply
[+] [-] WalterSear|12 years ago|reply
[+] [-] coolrhymes|12 years ago|reply