top | item 21228396

(no title)

tlynchpin | 6 years ago

If someone would say that on HackerNews then they would expect to hear "citation needed".

Anecdotally what I hear is a bunch of bitching and moaning about ES yet it clearly does work and has generally all of the difficulties of any CAP problem. This indicates to me that ES is addressing a Hard Problem and to the extent that it is long lived and quite popular, it's likely not substantially worse than any reasonable alternative.

Please tell us what you view as ElasticSearch fundamental flaws and give some proposed alternatives either as revisions to ES or entire other solution components.

discuss

jng|6 years ago

It is not necessary to have a valid alternative to validly declare something as fundamentally flawed.

ElasticSearch is as brittle as you can get. If you don't dimension Java heap sizes properly, nodes crash all the time and uncontrollable ultra-expensive shard relocation happens. Their open source available monitoring tools have the nice side effect of overloading the cluster and bringing it down (!). The result of it being a whole hodgepodge of Java-based repurposed Lucene does show in poor performance and very poor stability.

I've spent many a weekend trying to bring up a fallen ElasticSearch cluster, in some cases brought down just from monitoring. We had a use case that wasn't that easy, but not massive (100ks concurrent users, but not concurrent millions), and a properly developed C++ or even Python distributed solution would be more than able to handle it quite easily (source: ended up having to write it myself, didn't require massive anything to handle properly).

Frankly I admire Elastic because I have no idea how you can turn such a piece of software into ~$90MM yearly revenues, and, mainly, how you can turn that ~$90MM yearly revenue into a publicly traded company with a nearly $7bn market cap. So much to learn from them!

tlynchpin|6 years ago

This is what I'm talking about wrt bitching and moaning, in summary you tried to use ES but you didn't rtfm or didn't know about jvm tuning or didn't scale test and found out the weekend is a bad time to come up to speed on those, you had a bad time several times, plus you slashdotted yourself with monitoring; then you did a custom implementation for your vertical use case which didn't have the rtfm problem because you wrote it, but also only satisfied your case as opposed to the wide applicability of ES. Ultimately cool story bro because ES is freely available for anyone to use (many people do this) and modify (some people do this too) and your alternative is unknown.

What are the fundamental flaws of ES and what alternatives avoid those flaws, or how do you propose ES could address those flaws?

For example:

- "Algolia is so much better because it is a managed service." (hey whatsup ycombi)

- "Solr is also lucene but necessarily requires significant customization to the workload which avoids the common ES problem of it appearing to work so well out of the box that people neglect the details until it becomes an incident."

- "ES fundamental flaw is that zen disco mcast nonsense, people please stop being clever using mcast it never works in practice because igmp snoop". (hey whatsup we out here using ES since a while now)

atombender|6 years ago

Elasticsearch may not be fundamentally flawed, but it sure is flawed!

It's operationally unpredictable, even if you know all the corner cases (like field cache sizes) and JVM flag tuning voodoo. It's notoriously memory-hungry, and its networking is notoriously unstable. I've had issues where minor network blips throw the entire cluster into a weird quantum state where nodes are up but the cluster is down.

One particular annoyance with ES is that, once it starts having issues, it often becomes completely unresponsive, and its actual status becomes difficult to understand. You often can't access endpoints like /_cluster/health, /_cat/shards, etc. to diagnose, and meanwhile the logs are spewing inscutable Java stack traces that are of no help. There are clearly weird bottlenecks inside ES which fail in extreme circumstances.

It's gotten better. The consensus protocol was brittle and unsound for many years, and has slowly been patched for robustness, but I wouldn't say it's been fixed. ES is much more unreliable than many other clustered systems. The only one that comes to mind as being as unreliable is RabbitMQ.