(no title)
smoodles | 9 years ago
A couple of caveats. If you are coming from Nagios, this is a different worldview on monitoring. Like many other solutions commented here this is all based around metrics and their associated time series, and then you need to alert on those metrics. You ask the system questions with a time series query language.
Wavefront doesn't yet have a great solution for poll-based monitoring (i.e. hitting host Xs /healthcheck endpoint) so I still use terrible 'ol Nagios for that in my environment. However the rest of my work is all done in Wavefront - I'd say easily the high 90% of all my material alerts are done in wavefront with a small subset of work done in Nagios.
The killer feature here is the query language. I don't think there is anything else on the market that has its level of sophistication. I've had ex-Googlers on my team who "grew up" with Borgmon, which is in some sense the Ur-time series monitoring system and they loved it.
All this said, there are a lot of options about there. I have a strong bias against supporting my own complicated monitoring infrastructure. I want to focus on my own product. If you don't share that opinion or are on a super duper tight cash budget (but you do have time) than disregard the above ;)
bbrazil|9 years ago
Prometheus is inspired by Borgmon, and has a query language that is unmatched by almost everything else I'm aware of.
Are there public docs on the semantics and features of the WaveFront language so I can compare?