(no title)
p-o | 1 year ago
Logs are expensive compared to metrics, but they convey a lot more information about the state of your system. You want to move towards metrics over time only one hotspot at a time to reduce cost while keeping observability of your overall system.
I'll take logs over metrics any day of the week, when cost isn't prohibitive.
KaiserPro|1 year ago
However, over the space of about three years we shifted organically over to graphite+grafana. There wasn't a top down push, but once people realised how easy it was to make a dashboard, do templating and generally keep things working, they moved in droves. It also helped that people put metrics emitting system into the underlying hosting app library.
What really sealed the deal was the non-tech business owners making or updating dashboards. They managed to take pure tech metrics and turn them into service/business metrics.
david38|1 year ago
I was an engineer at Splunk for many years. I knew it cold.
I then joined a startup where they just used metrics and the logs TTLed out after just a week. They were just used for short term debugging.
The metrics were easier to put in, keep organized, make dashboards from, lighter, cheaper, better. I had been doing it wrong this whole time.
p-o|1 year ago
I've used both grafana+metrics and logs to different degrees. I've enjoyed using both, but any system I work on starts with logs and gradually add metrics as needed, it feels like a natural evolution to me, and I've worked at different scale, like you.
hanniabu|1 year ago
FridgeSeal|1 year ago
My experience has been the kind of opposite.
Yes, you can put more fields in a log, and you can nest stuff. In my experience however, attics tend to give me a clearer picture into the overall state (and behaviour) of my systems. I find them easier and faster to operate, easier to get an automatic chronology going, easier to alert on, etc.
Logs in my apps are mostly relegated to capturing warning error and error states for debugging reference as the metrics give us a quicker and easier indicator of issues.
lmpdev|1 year ago
If you average out metrics across all log files you’re potentially reaching false or worse inverse conclusions about multiple distinct subsets of the logs
It’s part of the reason why statisticians are so pedantic about the wording of their conclusions and to which subpopulation their conclusions actually apply to