(no title)
Nextgrid | 6 days ago
If something shows up in there, you should only have 2 options: 1) it’s an actual error and you fix it and make sure it never happens again, or 2) it’s not an error and then you fix it by adjusting the log level to make sure it isn’t one.
If someone suggests an “error budget” on my watch they get the door. You can have a warning budget (and the resources to adjust the log levels or remediation protocols to fix said “errors”) but actual errors should remain errors - otherwise they’re delivering broken software and that’s not what I’m paying them for.
Of course, companies who have the common sense to do this already do it and nobody in their right mind would suggest an “error budget”, but for those that don’t they have a serious problem that needs to be rectified.
The danger otherwise is that you’re making your observability pipeline useless if “errors” no longer actually mean errors. That’s really bad because now it opens the door to actual errors being ignored until it’s too late and then remediation is more costly.
No comments yet.