Hstore development for 9.4 release

[+] twic|12 years ago|reply

This is wonderful work, and it is the foundation for yet more wonderful work in the future. However, as much as i am a huge PostgreSQL fanboy and a dedicated MongoDB peanut-thrower, i worry that the caption applied to this post is misleading.

Once this work is released, PostgreSQL will be faster than current versions of MongoDB at carrying out queries for documents which look for a given value at a given path (or perhaps more generally, values satisfying a given predicate at paths satisfying a given predicate).

But that has never been one of MongoDB's important strengths, has it? The operations for which MongoDB is optimised are inserting documents, retrieving document by ID, and conducting map-reduces across the whole database. Yes, it has the ability to query by values in leaves, and it can use indices to make that faster, but that's a bit of a second-tier, bag-on-the-side feature, isn't it? If that was the main thing you needed to do, you wouldn't have chosen MongoDB, right?

[+] rpedela|12 years ago|reply

You can insert and retrieve by ID just as fast with PostgreSQL.

Map/reduce? Isn't that slow in general? Can't PostgreSQL JOIN and aggregate functions do most of what map/reduce is for but faster?

[+] mcgwiz|12 years ago|reply

You mischaracterize one of MongoDB's main draws. It's not just indexing, but indexing over a schemaless dataset. Mongo isn't unique in providing this, but combined with other features it is compelling for certain scenarios.

That said, if Postgres matches/exceeds enough Mongo features, it could certainly outshine Mongo for those scenarios.

AFAIK, map-reduce is relatively expensive on Mongo... typically recommended only for background execution, nothing synchronous.

[+] craigching|12 years ago|reply

I've harped on this many times in the past. I love PostgreSQL, but where is its easy to replicate feature that matches MongoDB for ease of replication? Please answer "you must not be keeping up with PostgreSQL lately" and I'll eat my words!

Easy replication ... that's why I use MongoDB.

[+] bramanga|12 years ago|reply

IMHO this seems like building a better apple by making it taste like an orange. This could definitely be useful for storing json from javascript frameworks like backbone/angular/ember, but I don't see how this is going to improve the lives of PostgreSQL users in general. Hopefully this leads to some weird cool emergent use case that make PostgreSQL way better than other databases, but I'm not sure at this point that it will. Does anybody have more context on the long-game for this approach? I'm not sure what the goal is for adding this feature.

[+] ddrmaxgt37|12 years ago|reply

The title will be changed soon. Don't worry.

[+] andreypopp|12 years ago|reply

This is impressive but not really surprising given the quality of PostgreSQL database. Quote:

"We added performance comparison with MongoDB. MongoDB is very slow on loading data (slide 59) - 8 minutes vs 76s, seqscan speed is the same - about 1s, index scan is very fast - 1ms vs 17 ms with GIN fast-scan patch. But we managed to create new opclass (slides 61-62) for hstore using hashing of full-paths concatenated with values and got 0.6ms, which is faster than mongodb !"

[+] baudehlo|12 years ago|reply

But note it's at the expense of a very large index file - 800MB (vs Mongo's 100MB). Although Pg's index covers the entire JSON structure, whereas Mongo's index only covers the leafs being searched, so it will optimize a larger variety of queries.

[+] jeffdavis|12 years ago|reply

Great work and research; horrible submission headline.

The news here is that, in addition to the huge market for traditional DBs, postgres is going to compete in a serious way on MongoDB's home turf. As that becomes more apparent, it will validate postgres's flexability/adaptability and cast doubt over special-purpose database systems and NoSQL.

MongoDB still has a story around clustering, of course. But that story is somewhat mixed (as all DB clustering stories are); and postgres is not standing still on that front, either.

(Disclaimer: I'm a member of the postgres community.)

[+] rdtsc|12 years ago|reply

> MongoDB still has a story around clustering, of course.

Well, mostly. Except for when it doesn't, especially when it comes to partitions.

http://aphyr.com/posts/284-call-me-maybe-mongodb

[+] craigching|12 years ago|reply

Hey Jeff, been a postgresql fan for a long time, but clustering is my number one issue. What is coming down the pipe for clustering/replication for PostgreSQL?

[+] integraton|12 years ago|reply

See also the PostgreSQL as a Schemaless Database slides (includes several benchmarks vs MongoDB): http://thebuild.com/presentations/pg-as-nosql-pgday-fosdem-2...

[+] gibybo|12 years ago|reply

Can I index fields within a JSON document in PostreSQL? Can I query them?

I.e. can I do something like SELECT json_field FROM data WHERE json_field.age > 15 ?

[+] rpedela|12 years ago|reply

That query would be this:

SELECT json_field FROM data WHERE json_field->'age' > 15

Part of the performance increase for hstore is improvements for GIN indexes, and according to the author can be applied to json. So yes you can use indexes on your hstore or json documents.

[+] andreypopp|12 years ago|reply

Yes, this is what index scan performance metrics is about — PostgreSQL outperforms MongoDB here.

[+] itodd|12 years ago|reply

This is how you do it in 9.2 using PLV8.

http://people.planetpostgresql.org/andrew/index.php?/archive...

For 9.3, see more info at http://michael.otacoo.com/postgresql-2/postgres-9-3-feature-...

[+] jeffasinger|12 years ago|reply

Yes, for a project I'm working on, I had a JSON array of objects that each had a key called name.

I have fast indexed queries that return any rows that have an object with a given name in it, and for full text search on that name along with a few other things.

[+] chrisfarms|12 years ago|reply

yes. See the operators at:

http://www.postgresql.org/docs/9.3/static/functions-json.htm...

[+] desireco42|12 years ago|reply

Which is not surprising, as this is 'real' database. I read review of MongoDB internals and it sounded not very positive. While you can create NoSQL quickly, to do it well with any db, it takes time which PostgreSQL definitely has.

Just to be clear, MongoDB is fine db for certain scenarios and I am using it in production.

[+] fennecfoxen|12 years ago|reply

The internals are so very slim... so I like this guy's advice for a Mongo replacement that's almost as slim: http://nyeggen.com/blog/2013/10/18/the-genius-and-folly-of-m...

"ZFS has a heckuva caching strategy, and you know when it accepts a write."

:P

[+] craigkerstiens|12 years ago|reply

If you want to find more of the detail and background around this which came from a talk by the author of the blog post then you check out his slides at http://www.sai.msu.su/~megera/postgres/talks/hstore-dublin-2.... One of the key things not in the blog post but addressed in the slides is that there will be a new index type that really pushes the performance gains even further.

[+] tete|12 years ago|reply

We actually use both of them in our system.

I really wish PostgreSQL wasn't such an enormous challenge to scale horizontally.

[+] asdasf|12 years ago|reply

What is challenging about it? If you make the same sacrifices you make to use mongo, it should be pretty straight forward.

[+] eksith|12 years ago|reply

Implementing our doc store on Postgres a while ago in lieu of switching to MongoDB suddenly seems like a very good call.

We had nothing against Mongo (frankly haven't gone into deep analysis of how it would turn out). It was simply the db we already had at hand and we knew it well and trusted it.

[+] gosukiwi|12 years ago|reply

Postgres is quite scary, I feel that if I use it I'll miss 90% of the features @_@

[+] wmil|12 years ago|reply

Don't worry about it, it's not like features have feelings.

There's always a fight between sticking to what your ORM supports and trying to use every feature of the database.

Either can simplify your code under certain circumstances.

[+] netghost|12 years ago|reply

Don't worry, you don't use 90% of the words in the english language either, but it's still useful ;)

[+] rdtsc|12 years ago|reply

At least your data will be there because the features are set to sane defaults.

[+] pstuart|12 years ago|reply

According to the roadmap page, 9.4 is due out in the 3rd quarter of 2014.

[+] roncohen|12 years ago|reply

Fantastic!

No mention of arrays in the post, but in the slides: you can use {1, 2} syntax for arrays and hstore now eats it \o/

[+] andrewcooke|12 years ago|reply

context: http://www.postgresql.org/docs/9.1/static/textsearch-indexes... (GIN and GiST are index types)

[+] willvarfar|12 years ago|reply

It would be great if they could run these benchmarks with mongodb using TokuMX.

I think tokutek's fractal trees would make an exciting data store for postgresql, but benchmarks are difficult to find... Who is perceived to be faster these days generally? Mysql+innodb, mysql+tokudb or postgresql?

[+] lucisferre|12 years ago|reply

Now if I could only get Mongoid to work with Postgres instead of Mongo I would be happy.

[+] andyl|12 years ago|reply

Postgres Array/Hstore/JSON datatypes have made my Rails apps way simpler. Great combination.

[+] eksith|12 years ago|reply

I'd love to know how you've set things up [ if you don't mind me prying ;) ]. It would be interesting to see someone else's approach.

[+] zen_boy|12 years ago|reply

Care to expand how? Curious as a Rails developer myself.

[+] ddorian43|12 years ago|reply

What is missing the a way to update specific fields in json/hsotre columns, or is there a way ?

[+] Hande1967|12 years ago|reply

[deleted]

[+] Hande1967|12 years ago|reply

[deleted]

[+] joshguthrie|12 years ago|reply

    $ brew remove mongodb

Okay guys, now we're talking!

[+] dkhenry|12 years ago|reply

So a database developer is saying that his database is faster then a competitors database. Shocking.....

90 comments