top | item 29385820

(no title)

vmarsy | 4 years ago

Observability isn't just a rebranding of Monitoring, it's Monitoring + making it the most actionnable possible via standardization.

Specifically, how to make the sum of all monitored "pillars" more useful than each of them individually.

3 major pillars being:

- Metrics (whether application or higher-level of the stack, like OS)

- Logs (whether structured or unstructured)

- Traces

Observability is these major pillars and how to easily "jump" from one to another to very quickly identify the root cause of an issue. I.e. go Metrics <-> Logs, Logs <-> Traces, or Metrics <-> Traces,

For instance, with good Metrics, one can easily figure out & get alerts when there is a large spike of 500 errors. But when Metrics & Logs can work together, one can easily see the exception from stack trace that are emitted with those 500 errors.

Similarly, with good Metrics, one can easily figure out that the frontend service latency p90 has increased by 5x. But with Metrics & Traces working together(for instance via Exemplar[1]), one can look at a bunch of the traces that have a very high latency, and identify the upstream service responsible for this increase.

With Monitoring only, you could get a nice Metrics solution in place, with fancy alerting rules, but all it was good at is informing you "Something bad is currently happening". With a good "Observability" setup, you should also be able to change it to "Something bad is currently happening and the root cause is right here."

[1] https://grafana.com/docs/grafana/latest/basics/exemplars/

discuss

simskij|4 years ago

logs, traces and profiling were all viable parts of a good monitoring stack even prior to the term observability being coined.

mmanciop|4 years ago

That’s trope #1 right there :-)