top | item 46652842

(no title)

vedhant | 1 month ago

Ofcourse it is not meant as a primary database. What baffles me is that people use it as log storage. As an application scales, storage and querying logs become the bottleneck if elasticsearch is used. I was dealing with a system that could afford only 1 week of log retention!

discuss

order

SlightlyLeftPad|1 month ago

Logs are always notoriously expensive to store and also are notorious for accidentally exposing PII, API/private/db keys, etc. They should generally only be stored for a relatively short period of time at scale. In fact, to remain compliant to CCPA, 28 days is the safe number for most things.

Metrics are much more efficient and are the tool of choice for longer term storage and debugging.

lillesvin|1 month ago

What kind of storage do you have backing your Elasticsearch? And how have you configured sharding and phase rollover in your indices?

I work with a cluster that holds 500+ TB logs (where most are stored for a year and some for 5 years because of regulations) in searchable snapshots backed by a locally hosted S3 solution. I can do filtering across most of the data in less than 10 seconds.

Some especially gnarly searches may take around 60-90 seconds on the first run as the searchable snapshots are mounted and cached, but subsequent searches in the cached dataset are obviously as fast as any other search in hot data.

Obviously Elasticsearch isn't without its quirks and drawbacks, but I have yet to come across anything that performs better and is more flexible for logs — especially in terms of architectural freedom and bang-for-the-buck.