top | item 7457307

(no title)

fvt | 12 years ago

PostgreSQL was already able to compete with Oracle's RDBMS and Microsoft's SQL Server but could soon supplant Mongo for most jobs.

It's great to know that the only required storage components nowadays could be PG and ElasticSearch (as PG's full-text search can't compete with ES), and that the former is a no-brainer to setup (on top of AWS, Rackspace, etc.) or cheap to acquire (with Heroku Postgres for example).

Good job !

discuss

order

arethuza|12 years ago

Out of interest, what would you say the weaknesses are of the full-text search in PostgreSQL?

NB I've been using PostgreSQL for a few months on a side project and I've been hugely impressed. I wanted to add full text searching at some point and rather than using Lucene or Solr (or similar) I thought I would use PostgreSQL's own search capabilities - which certainly makes some things a lot simpler than using a separate search engine.

pilif|12 years ago

I can't speak for the original poster, but I have three issues with the postgres offering:

1) it has suboptimal support for handling compound words (like finding the "wurst" in "bratwurst"). If the body you're searching is in a language that uses compounds (like german), then you have to use ispell dictionaries which have rudimentary support for compounds and which aren't maintained any more in many cases because ispell has been more or less replaced by hunspell which has far superior compound support which in turn is not supported by postgres.

2) If you use a dictionary for FTS (which you have to if you need to support compounds), the dictionary has to be loaded once per connection. Loading a 20MB dictionary takes about 0.5 seconds, so if you use Postgres FTS, you practically have to use persistent connections or some kind of proxy (like pgbouncer). Not a huge issue, but more infrastructure to keep in mind.

3) It's really hard to do google-suggest like query suggestions. In the end I had to resort to a bad hack in my case.

Nothing unsolvable, but not-quite-elastic search either.

elchief|12 years ago

Some differences between Solr and PostgreSQL Text Search:

1. Solr doesn't handle multi-word synonyms (without a hack), PG does. (ex: "Northern Ireland" => "UK")

2. Solr uses TF-IDF out of the box, and PG doesn't.

3. PG is good enough for 90% of cases, but Solr has some advanced stuff that PG doesn't. Like integration with OpenNLP, things like the WordDelimiterFilter. (Andre3000 = Andre 3000)

4. PG is kinda annoying in that it will parse "B.C." as a hostname, even though I want it to be a province.

5. Solr is faster than PG, but PG has everything in one server.

6. Solr handles character-grams and word-grams better.

bjourne|12 years ago

Another weakness not mentioned is document similarity search. You have one document and you want to find the N most similar in your collection to offer suggestions like Amazon does. You can do it with fts in pg, but it is way to slow to be practical. Custom fts engines are much better at that task.

netghost|12 years ago

If search is your application's main purpose, you'll want something more flexible.

Otherwise, it's fairly easy to implement, and you get full SQL support so joins, transactions, etc. So you can always prototype it and see what limitations you run into.

If you're just getting started with Postrges, I have some examples of using its' FTS search here: http://monkeyandcrow.com/blog/postgres_railsconf2013/

It's targeted at Rails, but I always show the SQL first, so you should be able to adapt it.

m_mueller|12 years ago

One thing that I don't see solved with PG + ElasticSearch is replication of data subsets to mobile devices. How would you solve that with this stack?

anentropic|12 years ago

Application code? Send the data down as JSON?

LunaSea|12 years ago

I still think that performance aside, MongoDB is more interesting for developers using Node.js because the development time is shorter and the API more intuitive.

cwp|12 years ago

It depends on what your data looks like. If you've got a collection of JSON documents that gets served verbatim through a JSON API, Mongo is a natural fit. If you've got a domain model with different types of entities and links between them, Mongo is aweful.

The app I'm working on right now evolved from the former to the latter, prompting me to switch from Mongo to Postgres, and it's made the code base much, much simpler. Mongo gets really painful when you have to fetch several documents (serially, because you have to follow the links between them) and join them at the application level before rendering the output.

For that kind of application, SQL with joins and sub-selects is so much better.

AlisdairO|12 years ago

The nice thing here is that it's easy enough to write a mongo compatability layer on top of Postgres (hell, it probably already exists...), allowing you the just-get-to-it ease of mongo combined with the substantially superior engineering of Postgres.