top | item 11878663

The Art of Monitoring

210 points| manojlds | 9 years ago |artofmonitoring.com | reply

62 comments

order
[+] thinkersilver|9 years ago|reply
First off, I'd like to say that I think this is great and we need more books in this space.

I hope this can be seen as constructive criticism but I do have a few comments on the layout and content.

1. Free chapter - perhaps chapter one or two may have been a better choice. I'd like to know the philosophy behind monitoring expressed in the book before diving into the details.

2. capacity planning - A chapter on this would have been great. Most teams I've worked with have struggled with sizing, planning resources required and archiving strategies with their monitoring solutions.

3. Monitoring strategies for different levels in the stack - where do I start, what should my short term goals be and so on.

4. naming of some of the chapters are too focused on the technology- for example the chapter on logstash could have been renamed to something to do with application logging or log scrapping.

5. visualisation and communication of results - there could have been a chapter on dashboard and reporting. This is a common issue with teams trying to understand how to do this.

This was written in a bit of a hurry but I hope my points came through.

[+] jamtur01|9 years ago|reply
Thanks for your feedback.

1. I decided not to do this because my experience is that people like to do something practical first. I've had a huge response to that chapter - lots of folks have gotten into Riemann that had previously been stuck. That alone is a solid +++ for me.

2. Each chapter contains some discussion of capacity planning for specific tools, where relevant.

3. The capstone chapters (11-13) discusses this, as do the chapters covering logging and application instrumentation.

4. Thanks - I'll consider that.

5. I discuss in various chapters visualization but I've found that most folks have very different needs and desires. So I focussed on discussing what to show in small segments as well as some visual design discussion rather than a specific chapter on dashboarding/reporting. Hard choice but a 750 page book needs to stop somewhere. :)

Thanks for taking the time to comment - it's awesome when folks share their thoughts!

[+] _asummers|9 years ago|reply
As a side recommendation, I'm most of the way through The Practice of Cloud Administration, and it hits on most of the things you've mentioned. It's on SBO, so you likely have a copy through work.

http://the-cloud-book.com

[+] cothomps|9 years ago|reply
To the "free chapter" idea - I think the website has a pretty good prospectus, and compared to other tech books / publishers, it's certainly well worth taking a flyer on it for a maximum of $20.
[+] graycat|9 years ago|reply
The monitoring considered in the OP is for server farms and networks where the main challenges are rates of false alarms and rates of missed detections.

The challenge is to find means of monitoring that will permit selecting the rate of false alarms are willing to tolerate and, then, for that rate, get the lowest rate possible for missed detections.

Thus, would like to use the Neyman-Pearson result. Usually, however, for this context, do not have enough data for that. E.g., typically are quite short on data on the anomalies are trying to detect, and more short as the systems become more reliable.

From the above, we see that necessarily and inescapably such monitoring is some continually applied statistical hypothesis tests.

Apparently in practice, false alarm rate is not known and not reasonable to select or even to adjust.

Then we see that we need tests are both multi-dimensional and distribution-free.

A special case of high interest is zero-day problems, that is, detecting problems never seen before. So, this is behavioral monitoring -- any behavior sufficiently unusual is regarded as an anomaly, that is, evidence of something wrong.

From all I can see, so far the monitoring community has yet to take these points to heart.

The OP's remarks on thresholds are on target: Thresholds have been the old, lame, weak workhorse of monitoring far too long.

If anyone is actually seriously interested in this subject, let me know. Some years ago I concluded that no one was interested!

[+] jamtur01|9 years ago|reply
It's a spectrum to me. We're way behind the curve on monitoring and the "state of the art" in, anywhere but cutting edge shops, is woeful. I'd love folks to be able to anomaly detection easily and simply but the technology and tools aren't quite there yet. I am just hoping to get folks to advance their environments a little way forward.
[+] pramodbiligiri|9 years ago|reply
I'd like to see this paper as well! Email in my profile.
[+] ginkgotree|9 years ago|reply
Another thought. Two really: - Lead us into more about the why before showing the price. Really tell the story of the pain that so many of us can identify with. - Use a standard three teir pricing box style, with the values of each above the price. Something like: https://planscope.io/pricing/
[+] jamtur01|9 years ago|reply
Thanks - I added a pricing panel.
[+] jamtur01|9 years ago|reply
Thanks - I'll consider that.
[+] bogomipz|9 years ago|reply
If you haven't read one of the authors book's before (he's released titles on both Logstash and Docker.), he puts out really quality material and he seems to update them when new releases of the subject come out. This looks like another great release. Kudos James.
[+] coredog64|9 years ago|reply
Seconded. My satisfaction with the Docker book has had me on the edge of my seat for this book since I first saw the "coming soon" message.
[+] drauh|9 years ago|reply
Agreed. I have and used his books on Puppet, and they are awesome.
[+] AndyNemmity|9 years ago|reply
Just bought the book, and I didn't even realize it was James Turnbull, that guy is extremely advanced in monitoring. I've seen many of his talks online, and no one speaks about monitoring in a more rational way that I've seen.

This is Hacker News at it's best in my view.

[+] guidedlight|9 years ago|reply
The looks to be more about infrastructure monitoring... less about application monitoring (i.e. New Relic / AppDynamics, and synthetics)
[+] jamtur01|9 years ago|reply
That's mostly correct. I have a chapter on adding instrumentation to applications with examples in Ruby/Rails and Clojure that can easily be adapted to other languages and frameworks. I also cover adding structured logging to your applications.
[+] SonicSoul|9 years ago|reply
Love your site design. it's clean and easy to read, great presentation. Do you just whip up a new design from scratch each time you create a book or did you hire someone or purchase off the shelf? Curious as these things take me forever..
[+] jamtur01|9 years ago|reply
I usually find a template I like and modify it. I really should start paying someone. It's not my core skillset. :)
[+] reptation|9 years ago|reply
check_mk is a very useful monitoring system which doesn't seem to be included: http://mathias-kettner.com/check_mk.html
[+] bogomipz|9 years ago|reply
I would argue that its not and that it's a mess, starting with that URL. This is yet another fork of an outdated monitoring system(how many Nagios forks are there?) The architecture diagram made my head hurt, how many moving parts is that? I counted 8 separate components. There are much more modern monitoring systems these days such as Prometheus, Bosun and Riemann.
[+] alamaison|9 years ago|reply
Is this going to be published as a physical book too?
[+] jamtur01|9 years ago|reply
Not as this stage - my experience with how fast physical books date hasn't been good.
[+] Omnipresent|9 years ago|reply
does the book provide example applications that are monitored using the information provided or does it just go through the tools that can be used.
[+] jamtur01|9 years ago|reply
It provides several example applications and goes through the tools.
[+] ginkgotree|9 years ago|reply
Wouldn't be a bad idea to put the free sample chapter behind an email list signup - worked well for my book!
[+] jamtur01|9 years ago|reply
Thanks - I considered that but I don't like to put barriers in people's way - especially for free content. I don't like to be too marketing-esque. Just not in my nature. :) A couple of thousand folks signed up to the mailing list using the current approach, which I'm pretty happy with.
[+] Thriptic|9 years ago|reply
People didn't just use a guerrilla mail or an equivalent?