top | item 40936515

(no title)

fulmicoton | 1 year ago

This is their application logs. They need to search into it in a comfortable manner. They went for a search engine with Elasticsearch at first, and Quickwit after that because even after restriction the search on a tag and a time window "grepping" was not a viable option.

discuss

order

jcgrillo|1 year ago

This position has always confused me. IME logs search tools (ELK and their SaaS ilk) are always far too restrictive and uncomfortable compared to Hadoop/Spark. I'd much rather have unfettered access to the data and have to wait a couple seconds for my query to return than be pigeonholed into some horrible DSL built around an indexing scheme. I couldn't care less about my logs queries returning in sub-second time, it's just not a requirement. The fact that people index logs is baffling.

fulmicoton|1 year ago

If you can limit your research to GBs of logs, I kind of agree with you. It's ok if a log search request takes 100ms instead of 2s, and the "grep" approach is more flexible.

Usually our users search into > 1TB.

Let's imagine you have to search into 10TB (even after time/tag pruning). Distributing over 10k cores over 2 second is not practical and does not always economically make sense.

esafak|1 year ago

It sounds like you are doing ETL on your logs. Most people want to search them when something goes wrong, which means indexing.

AJSDfljff|1 year ago

Would be curious what they are searching exactly.

At this size and cost, aligning what you log should save a lot of money.

fulmicoton|1 year ago

The data is just Binance's application logs for observability. Typically what a smaller business would simply send to Datadog.

This log search infra is handled by two engineers who do that for the entire company.

They have some standardized log format that all teams are required to observe, but they have little control on how much data is logged by each service.

(I'm quickwit CTO by the way)

BiteCode_dev|1 year ago

Financial institutions have to log a lot just to comply with regulations, including every user activity and every money flow. On an exchange that does billions of operation per seconds, often with bots, that's a lot.