(no title)
klaruz | 11 years ago
You get a set of basic VM level metrics, and you can feed it custom metrics from your app, or log files. All of which can be configured to alarm. I don't think it's possible to run advanced statistics on the metrics for alarming (eg, standard deviation from 30 minute exceeds N), but it may be. Usually it's just an event count, like more than N 500 errors over X time.
I do agree you need to think deeper than basic health checks though, 'broken server' is always a hard boolean to nail down.
No comments yet.