top | item 42336065

(no title)

iampims | 1 year ago

lossy and simpler.

IME, I've found sampling simpler to reason about, and with the sampling rate part of the message, deriving metrics from logs works pretty well.

The example in the article is a little contrived. Healthchecks often originate from multiple hosts and/or logs contain the remote address+port, leading to each log message being effectively unique. So sure, one could parse the remote address into remote_address=192.168.12.23 remote_port=64780 and then decide to drop the port in the aggregation, but is it worth the squeeze?

discuss

kiitos|1 year ago

If a service emits a log event, then that log event should be visible in your logging system. Basic stuff. Sampling fails this table-stakes requirement.

eru|1 year ago

Typically, you store your most recent logs in full, and you can move to sampling for older logs (if you don't want to delete them outright).