top | item 39197784

(no title)

P-Nuts | 2 years ago

Debuggers are nice for development but we also need to be able to analyse field issues. On-disk log files are usually too expensive so we mostly write to raw binary in-memory trace tables, which will wrap, but hopefully will have enough history to figure out what went wrong, and will be included in a core dump.

discuss

drewcoo|2 years ago

For issues in the field you really want metrics and logs. That way it's easy to monitor for the state of things and to zoom in on the specific data you need when you're investigating. OMG right now! Or days or weeks from now. With a single entity or local group or a distributed set of them. Even if you're investigating a single system, you may want to correlate with other events in other systems leading to, simultaneous with, or soon following your incident. When people talk about o11y (observability) they mean this.

Ideally, events will be recoverable, but also still debug-able. Depending on the kind of thing you're looking at you may not have the (somewhat dubious) luxury of a core dump.

I'm still on the fence about whether a core dump or a Java exception unwind is more useful for new staff awakened up by a "pager" at 4 am. /s