The Art of Monitoring

[+] thinkersilver|9 years ago|reply

First off, I'd like to say that I think this is great and we need more books in this space.

I hope this can be seen as constructive criticism but I do have a few comments on the layout and content.

1. Free chapter - perhaps chapter one or two may have been a better choice. I'd like to know the philosophy behind monitoring expressed in the book before diving into the details.

2. capacity planning - A chapter on this would have been great. Most teams I've worked with have struggled with sizing, planning resources required and archiving strategies with their monitoring solutions.

3. Monitoring strategies for different levels in the stack - where do I start, what should my short term goals be and so on.

4. naming of some of the chapters are too focused on the technology- for example the chapter on logstash could have been renamed to something to do with application logging or log scrapping.

5. visualisation and communication of results - there could have been a chapter on dashboard and reporting. This is a common issue with teams trying to understand how to do this.

This was written in a bit of a hurry but I hope my points came through.

[+] jamtur01|9 years ago|reply

Thanks for your feedback.

1. I decided not to do this because my experience is that people like to do something practical first. I've had a huge response to that chapter - lots of folks have gotten into Riemann that had previously been stuck. That alone is a solid +++ for me.

2. Each chapter contains some discussion of capacity planning for specific tools, where relevant.

3. The capstone chapters (11-13) discusses this, as do the chapters covering logging and application instrumentation.

4. Thanks - I'll consider that.

5. I discuss in various chapters visualization but I've found that most folks have very different needs and desires. So I focussed on discussing what to show in small segments as well as some visual design discussion rather than a specific chapter on dashboarding/reporting. Hard choice but a 750 page book needs to stop somewhere. :)

Thanks for taking the time to comment - it's awesome when folks share their thoughts!

[+] _asummers|9 years ago|reply

As a side recommendation, I'm most of the way through The Practice of Cloud Administration, and it hits on most of the things you've mentioned. It's on SBO, so you likely have a copy through work.

http://the-cloud-book.com

[+] cothomps|9 years ago|reply

To the "free chapter" idea - I think the website has a pretty good prospectus, and compared to other tech books / publishers, it's certainly well worth taking a flyer on it for a maximum of $20.

[+] graycat|9 years ago|reply

The monitoring considered in the OP is for server farms and networks where the main challenges are rates of false alarms and rates of missed detections.

The challenge is to find means of monitoring that will permit selecting the rate of false alarms are willing to tolerate and, then, for that rate, get the lowest rate possible for missed detections.

Thus, would like to use the Neyman-Pearson result. Usually, however, for this context, do not have enough data for that. E.g., typically are quite short on data on the anomalies are trying to detect, and more short as the systems become more reliable.

From the above, we see that necessarily and inescapably such monitoring is some continually applied statistical hypothesis tests.

Apparently in practice, false alarm rate is not known and not reasonable to select or even to adjust.

Then we see that we need tests are both multi-dimensional and distribution-free.

A special case of high interest is zero-day problems, that is, detecting problems never seen before. So, this is behavioral monitoring -- any behavior sufficiently unusual is regarded as an anomaly, that is, evidence of something wrong.

From all I can see, so far the monitoring community has yet to take these points to heart.

The OP's remarks on thresholds are on target: Thresholds have been the old, lame, weak workhorse of monitoring far too long.

If anyone is actually seriously interested in this subject, let me know. Some years ago I concluded that no one was interested!

[+] jamtur01|9 years ago|reply

It's a spectrum to me. We're way behind the curve on monitoring and the "state of the art" in, anywhere but cutting edge shops, is woeful. I'd love folks to be able to anomaly detection easily and simply but the technology and tools aren't quite there yet. I am just hoping to get folks to advance their environments a little way forward.

[+] bdavis56|9 years ago|reply

here's a good article on why anomaly detection is so hard: https://blogs.wavefront.com/2016/04/21/why-is-operational-an...

[+] pramodbiligiri|9 years ago|reply

I'd like to see this paper as well! Email in my profile.

[+] ginkgotree|9 years ago|reply

Another thought. Two really: - Lead us into more about the why before showing the price. Really tell the story of the pain that so many of us can identify with. - Use a standard three teir pricing box style, with the values of each above the price. Something like: https://planscope.io/pricing/

[+] jamtur01|9 years ago|reply

Thanks - I added a pricing panel.

[+] jamtur01|9 years ago|reply

Thanks - I'll consider that.

[+] bogomipz|9 years ago|reply

If you haven't read one of the authors book's before (he's released titles on both Logstash and Docker.), he puts out really quality material and he seems to update them when new releases of the subject come out. This looks like another great release. Kudos James.

[+] coredog64|9 years ago|reply

Seconded. My satisfaction with the Docker book has had me on the edge of my seat for this book since I first saw the "coming soon" message.

[+] drauh|9 years ago|reply

Agreed. I have and used his books on Puppet, and they are awesome.

[+] jamtur01|9 years ago|reply

Thanks mate - very kind!

[+] AndyNemmity|9 years ago|reply

Just bought the book, and I didn't even realize it was James Turnbull, that guy is extremely advanced in monitoring. I've seen many of his talks online, and no one speaks about monitoring in a more rational way that I've seen.

This is Hacker News at it's best in my view.

[+] guidedlight|9 years ago|reply

The looks to be more about infrastructure monitoring... less about application monitoring (i.e. New Relic / AppDynamics, and synthetics)

[+] jamtur01|9 years ago|reply

That's mostly correct. I have a chapter on adding instrumentation to applications with examples in Ruby/Rails and Clojure that can easily be adapted to other languages and frameworks. I also cover adding structured logging to your applications.

[+] SonicSoul|9 years ago|reply

Love your site design. it's clean and easy to read, great presentation. Do you just whip up a new design from scratch each time you create a book or did you hire someone or purchase off the shelf? Curious as these things take me forever..

[+] jamtur01|9 years ago|reply

I usually find a template I like and modify it. I really should start paying someone. It's not my core skillset. :)

[+] reptation|9 years ago|reply

check_mk is a very useful monitoring system which doesn't seem to be included: http://mathias-kettner.com/check_mk.html

[+] bogomipz|9 years ago|reply

I would argue that its not and that it's a mess, starting with that URL. This is yet another fork of an outdated monitoring system(how many Nagios forks are there?) The architecture diagram made my head hurt, how many moving parts is that? I counted 8 separate components. There are much more modern monitoring systems these days such as Prometheus, Bosun and Riemann.

[+] alamaison|9 years ago|reply

Is this going to be published as a physical book too?

[+] jamtur01|9 years ago|reply

Not as this stage - my experience with how fast physical books date hasn't been good.

[+] Omnipresent|9 years ago|reply

does the book provide example applications that are monitored using the information provided or does it just go through the tools that can be used.

[+] jamtur01|9 years ago|reply

It provides several example applications and goes through the tools.

[+] ginkgotree|9 years ago|reply

Wouldn't be a bad idea to put the free sample chapter behind an email list signup - worked well for my book!

[+] jamtur01|9 years ago|reply

Thanks - I considered that but I don't like to put barriers in people's way - especially for free content. I don't like to be too marketing-esque. Just not in my nature. :) A couple of thousand folks signed up to the mailing list using the current approach, which I'm pretty happy with.

[+] Thriptic|9 years ago|reply

People didn't just use a guerrilla mail or an equivalent?

62 comments