This is wonderful work, and it is the foundation for yet more wonderful work in the future. However, as much as i am a huge PostgreSQL fanboy and a dedicated MongoDB peanut-thrower, i worry that the caption applied to this post is misleading.
Once this work is released, PostgreSQL will be faster than current versions of MongoDB at carrying out queries for documents which look for a given value at a given path (or perhaps more generally, values satisfying a given predicate at paths satisfying a given predicate).
But that has never been one of MongoDB's important strengths, has it? The operations for which MongoDB is optimised are inserting documents, retrieving document by ID, and conducting map-reduces across the whole database. Yes, it has the ability to query by values in leaves, and it can use indices to make that faster, but that's a bit of a second-tier, bag-on-the-side feature, isn't it? If that was the main thing you needed to do, you wouldn't have chosen MongoDB, right?
You mischaracterize one of MongoDB's main draws. It's not just indexing, but indexing over a schemaless dataset. Mongo isn't unique in providing this, but combined with other features it is compelling for certain scenarios.
That said, if Postgres matches/exceeds enough Mongo features, it could certainly outshine Mongo for those scenarios.
AFAIK, map-reduce is relatively expensive on Mongo... typically recommended only for background execution, nothing synchronous.
I've harped on this many times in the past. I love PostgreSQL, but where is its easy to replicate feature that matches MongoDB for ease of replication? Please answer "you must not be keeping up with PostgreSQL lately" and I'll eat my words!
IMHO this seems like building a better apple by making it taste like an orange. This could definitely be useful for storing json from javascript frameworks like backbone/angular/ember, but I don't see how this is going to improve the lives of PostgreSQL users in general. Hopefully this leads to some weird cool emergent use case that make PostgreSQL way better than other databases, but I'm not sure at this point that it will. Does anybody have more context on the long-game for this approach? I'm not sure what the goal is for adding this feature.
This is impressive but not really surprising given the quality of PostgreSQL database. Quote:
"We added performance comparison with MongoDB. MongoDB is very slow on loading data (slide 59) - 8 minutes vs 76s,
seqscan speed is the same - about 1s, index scan is very fast - 1ms vs 17 ms with GIN fast-scan patch. But we managed to create new opclass (slides 61-62) for hstore using hashing of full-paths concatenated with values and got 0.6ms, which is faster than mongodb !"
But note it's at the expense of a very large index file - 800MB (vs Mongo's 100MB). Although Pg's index covers the entire JSON structure, whereas Mongo's index only covers the leafs being searched, so it will optimize a larger variety of queries.
Great work and research; horrible submission headline.
The news here is that, in addition to the huge market for traditional DBs, postgres is going to compete in a serious way on MongoDB's home turf. As that becomes more apparent, it will validate postgres's flexability/adaptability and cast doubt over special-purpose database systems and NoSQL.
MongoDB still has a story around clustering, of course. But that story is somewhat mixed (as all DB clustering stories are); and postgres is not standing still on that front, either.
(Disclaimer: I'm a member of the postgres community.)
Hey Jeff, been a postgresql fan for a long time, but clustering is my number one issue. What is coming down the pipe for clustering/replication for PostgreSQL?
SELECT json_field FROM data WHERE json_field->'age' > 15
Part of the performance increase for hstore is improvements for GIN indexes, and according to the author can be applied to json. So yes you can use indexes on your hstore or json documents.
Yes, for a project I'm working on, I had a JSON array of objects that each had a key called name.
I have fast indexed queries that return any rows that have an object with a given name in it, and for full text search on that name along with a few other things.
Which is not surprising, as this is 'real' database. I read review of MongoDB internals and it sounded not very positive. While you can create NoSQL quickly, to do it well with any db, it takes time which PostgreSQL definitely has.
Just to be clear, MongoDB is fine db for certain scenarios and I am using it in production.
If you want to find more of the detail and background around this which came from a talk by the author of the blog post then you check out his slides at http://www.sai.msu.su/~megera/postgres/talks/hstore-dublin-2.... One of the key things not in the blog post but addressed in the slides is that there will be a new index type that really pushes the performance gains even further.
Implementing our doc store on Postgres a while ago in lieu of switching to MongoDB suddenly seems like a very good call.
We had nothing against Mongo (frankly haven't gone into deep analysis of how it would turn out). It was simply the db we already had at hand and we knew it well and trusted it.
It would be great if they could run these benchmarks with mongodb using TokuMX.
I think tokutek's fractal trees would make an exciting data store for postgresql, but benchmarks are difficult to find... Who is perceived to be faster these days generally? Mysql+innodb, mysql+tokudb or postgresql?
[+] [-] twic|12 years ago|reply
Once this work is released, PostgreSQL will be faster than current versions of MongoDB at carrying out queries for documents which look for a given value at a given path (or perhaps more generally, values satisfying a given predicate at paths satisfying a given predicate).
But that has never been one of MongoDB's important strengths, has it? The operations for which MongoDB is optimised are inserting documents, retrieving document by ID, and conducting map-reduces across the whole database. Yes, it has the ability to query by values in leaves, and it can use indices to make that faster, but that's a bit of a second-tier, bag-on-the-side feature, isn't it? If that was the main thing you needed to do, you wouldn't have chosen MongoDB, right?
[+] [-] rpedela|12 years ago|reply
Map/reduce? Isn't that slow in general? Can't PostgreSQL JOIN and aggregate functions do most of what map/reduce is for but faster?
[+] [-] mcgwiz|12 years ago|reply
That said, if Postgres matches/exceeds enough Mongo features, it could certainly outshine Mongo for those scenarios.
AFAIK, map-reduce is relatively expensive on Mongo... typically recommended only for background execution, nothing synchronous.
[+] [-] craigching|12 years ago|reply
Easy replication ... that's why I use MongoDB.
[+] [-] bramanga|12 years ago|reply
[+] [-] ddrmaxgt37|12 years ago|reply
[+] [-] andreypopp|12 years ago|reply
"We added performance comparison with MongoDB. MongoDB is very slow on loading data (slide 59) - 8 minutes vs 76s, seqscan speed is the same - about 1s, index scan is very fast - 1ms vs 17 ms with GIN fast-scan patch. But we managed to create new opclass (slides 61-62) for hstore using hashing of full-paths concatenated with values and got 0.6ms, which is faster than mongodb !"
[+] [-] baudehlo|12 years ago|reply
[+] [-] jeffdavis|12 years ago|reply
The news here is that, in addition to the huge market for traditional DBs, postgres is going to compete in a serious way on MongoDB's home turf. As that becomes more apparent, it will validate postgres's flexability/adaptability and cast doubt over special-purpose database systems and NoSQL.
MongoDB still has a story around clustering, of course. But that story is somewhat mixed (as all DB clustering stories are); and postgres is not standing still on that front, either.
(Disclaimer: I'm a member of the postgres community.)
[+] [-] rdtsc|12 years ago|reply
Well, mostly. Except for when it doesn't, especially when it comes to partitions.
http://aphyr.com/posts/284-call-me-maybe-mongodb
[+] [-] craigching|12 years ago|reply
[+] [-] integraton|12 years ago|reply
[+] [-] gibybo|12 years ago|reply
I.e. can I do something like SELECT json_field FROM data WHERE json_field.age > 15 ?
[+] [-] rpedela|12 years ago|reply
SELECT json_field FROM data WHERE json_field->'age' > 15
Part of the performance increase for hstore is improvements for GIN indexes, and according to the author can be applied to json. So yes you can use indexes on your hstore or json documents.
[+] [-] andreypopp|12 years ago|reply
[+] [-] itodd|12 years ago|reply
http://people.planetpostgresql.org/andrew/index.php?/archive...
For 9.3, see more info at http://michael.otacoo.com/postgresql-2/postgres-9-3-feature-...
[+] [-] jeffasinger|12 years ago|reply
I have fast indexed queries that return any rows that have an object with a given name in it, and for full text search on that name along with a few other things.
[+] [-] chrisfarms|12 years ago|reply
http://www.postgresql.org/docs/9.3/static/functions-json.htm...
[+] [-] desireco42|12 years ago|reply
Just to be clear, MongoDB is fine db for certain scenarios and I am using it in production.
[+] [-] fennecfoxen|12 years ago|reply
"ZFS has a heckuva caching strategy, and you know when it accepts a write."
:P
[+] [-] craigkerstiens|12 years ago|reply
[+] [-] tete|12 years ago|reply
I really wish PostgreSQL wasn't such an enormous challenge to scale horizontally.
[+] [-] asdasf|12 years ago|reply
[+] [-] eksith|12 years ago|reply
We had nothing against Mongo (frankly haven't gone into deep analysis of how it would turn out). It was simply the db we already had at hand and we knew it well and trusted it.
[+] [-] gosukiwi|12 years ago|reply
[+] [-] wmil|12 years ago|reply
There's always a fight between sticking to what your ORM supports and trying to use every feature of the database.
Either can simplify your code under certain circumstances.
[+] [-] netghost|12 years ago|reply
[+] [-] rdtsc|12 years ago|reply
[+] [-] pstuart|12 years ago|reply
[+] [-] roncohen|12 years ago|reply
No mention of arrays in the post, but in the slides: you can use {1, 2} syntax for arrays and hstore now eats it \o/
[+] [-] andrewcooke|12 years ago|reply
[+] [-] willvarfar|12 years ago|reply
I think tokutek's fractal trees would make an exciting data store for postgresql, but benchmarks are difficult to find... Who is perceived to be faster these days generally? Mysql+innodb, mysql+tokudb or postgresql?
[+] [-] lucisferre|12 years ago|reply
[+] [-] andyl|12 years ago|reply
[+] [-] eksith|12 years ago|reply
[+] [-] zen_boy|12 years ago|reply
[+] [-] ddorian43|12 years ago|reply
[+] [-] Hande1967|12 years ago|reply
[deleted]
[+] [-] Hande1967|12 years ago|reply
[deleted]
[+] [-] joshguthrie|12 years ago|reply
[+] [-] dkhenry|12 years ago|reply