I've done two large search projects in the last year. One with Postgres search and one with elastic search. The elastic search version was trivial to configure, and was embarrassingly superior to the point where I wish I had never considered Postgres.
r888888888|9 years ago
If you're using a secondary data store like Elasticsearch presumably you're combining it with a traditional relational database like Postgres or MySQL. And in most use cases, you're going to require additional constraints on top of full text, like restricting to a specific customer or topic, etc. This means data duplication. Any experienced developer should be breaking out in hives at this point because this means (1) keeping two data stores in sync; (2) adding another service to the deployment process; (3) adding another service to development environments; (4) adding another service to debug when problems crop up; (5) large scale schema changes and migrations are complicated because of two data stores; (6) operational overhead from maintaining Elasticsearch like separate backup scripts and deployment scripts and sharding config (across production, qa, and dev); (7) queries are potentially complicated because you may have to aggregate information from two data stores (pagination and delayed updates makes this even more hellish); (8) if your company is anal about devs accessing production data then ES just introduces another layer of requesting permissions and every time the schema changes you have to ask again.
And maybe the advantages of Elasticsearch far outweigh these negatives, but this is a conversation you're going to have with your devs because the cost of using ES is not zero. There are tradeoffs, and maybe just using Postgres full text search is adequate for your use case given sufficient additional constraints.
nostrademons|9 years ago
It's not a serious problem as long as you designate one data source as primary and every other data source as derived. That way, if you ever get data corruption issues, you can re-create the derived data and blow away the old copy. You also don't face complex data-sync issues, because all data-transfer is unidirectional. You write into Postgres or your other source-of-truth, and then you write from Postgres to ElasticSearch. If there's a difference between them, the Postgres version is authoritative, so copy its data over ElasticSearch.
Most of your other points are common to adding any new major feature, and you should always weigh the user benefits against the maintenance costs of a large new dependency. Presumably, if you're looking at ElasticSearch, it's because full-text search in your RDBMS is not good enough (which won't always be the case, but has been my experience and apparently the experience of several other people here).
ngrilly|9 years ago
vkjv|9 years ago
http://martinfowler.com/bliki/CQRS.html
patsplat|9 years ago
It's much easier to scale a ephemeral search service than a transactional database. Likely even one search node will support a higher query throughput.
vkjv|9 years ago
[deleted]
irrational|9 years ago
SanFranManDan|9 years ago
sudo apt-get install elastic-search
pip install django-haystack
add haystack to installed_apps, add the elastic search backend and set the end point.
use a haystack index class that is prebuilt to hook into all model change signals.
done. No weird configurations, everything was vanilla.
Maybe I am missing a step but it took me less than an hour to get everything going and it hasn't had to have any maintenance.
When you are adding ngram support and all the indexes and views in postgres to replicate the behavior, ES looks to be less complicated. At the very least I don't see a reduction in complexity doing the postgres way just that you have 1 less dependency to worry about.
nostrademons|9 years ago
https://www.elastic.co/guide/en/elasticsearch/guide/2.x/gett...
ngrilly|9 years ago
threeseed|9 years ago
Unless you need basic search and you desperately want to keep everything in the same database I would split it out. Architecturally it makes more sense to split it out just to allow for different scaling needs.
nugator|9 years ago
twic|9 years ago
rpedela|9 years ago
1. https://www.elastic.co/guide/en/elasticsearch/reference/curr...
2. https://www.elastic.co/guide/en/elasticsearch/guide/current/...
eva1984|9 years ago
As far as I am concerned, it is far superior to the goodies that mentioned in this article.
craigkerstiens|9 years ago
combatentropy|9 years ago
p. 1: sales pitch
p. 2: sales pitch
p. 3: "The only requirement for installing Elasticsearch is a recent version of Java." Ugh.
pp. 4-10: A new JSON API to learn. This is just the part about getting your stuff into Elasticsearch and, at last, doing a query that is no better than "select * from table where last_name like '%Smith%';
p. 11: "More complicated searches," an apt title. Here is the Elasticsearch query that's equivalent to "select * from table where last_name like '%smith%' and age > 30":
It only gets more complicated from there. There is talk about ranking, but it was still a black box even by the end of the Getting Started guide, and I was going to want to tweak it. It just seemed more obvious how to do so with Postgres (https://www.postgresql.org/docs/9.6/static/textsearch-contro...).How is Postgres merely "good enough"? How are other products better? What is their draw?
Is it ease of set-up? For me Elasticsearch was harder to install than Postgres and made me learn a bunch of new syntax, beyond the SQL I have to know anyway to write web applications in the first place. Postgres handles stemming, stopwords, ranking, snippets, and highlighting.
Is it better ranking? Postgres can weigh the search words' frequency, their proximity to each other, the length of the document, the number of unique words in the document, and the "mean harmonic distance between extents." I was able to further tweak the results by using other columns and joining other tables.
Is it faster search results? Postgres searches tens of thousands of pages in a split second.
mozumder|9 years ago
eric_khun|9 years ago
threeseed|9 years ago
unknown|9 years ago
[deleted]