First off, I'd like to say that I think this is great and we need more books in this space.
I hope this can be seen as constructive criticism but I do have a few comments on the layout and content.
1. Free chapter - perhaps chapter one or two may have been a better choice. I'd like to know the philosophy behind monitoring expressed in the book before diving into the details.
2. capacity planning - A chapter on this would have been great. Most teams I've worked with have struggled with sizing, planning resources required and archiving strategies with their monitoring solutions.
3. Monitoring strategies for different levels in the stack - where do I start, what should my short term goals be and so on.
4. naming of some of the chapters are too focused on the technology- for example the chapter on logstash could have been renamed to something to do with application logging or log scrapping.
5. visualisation and communication of results - there could have been a chapter on dashboard and reporting. This is a common issue with teams trying to understand how to do this.
This was written in a bit of a hurry but I hope my points came through.
1. I decided not to do this because my experience is that people like to do something practical first. I've had a huge response to that chapter - lots of folks have gotten into Riemann that had previously been stuck. That alone is a solid +++ for me.
2. Each chapter contains some discussion of capacity planning for specific tools, where relevant.
3. The capstone chapters (11-13) discusses this, as do the chapters covering logging and application instrumentation.
4. Thanks - I'll consider that.
5. I discuss in various chapters visualization but I've found that most folks have very different needs and desires. So I focussed on discussing what to show in small segments as well as some visual design discussion rather than a specific chapter on dashboarding/reporting. Hard choice but a 750 page book needs to stop somewhere. :)
Thanks for taking the time to comment - it's awesome when folks share their thoughts!
As a side recommendation, I'm most of the way through The Practice of Cloud Administration, and it hits on most of the things you've mentioned. It's on SBO, so you likely have a copy through work.
To the "free chapter" idea - I think the website has a pretty good prospectus, and compared to other tech books / publishers, it's certainly well worth taking a flyer on it for a maximum of $20.
The monitoring considered in the OP is for server farms and networks where the main challenges are rates of false alarms and rates of missed detections.
The challenge is to find means of monitoring that will permit selecting the rate of false alarms are willing to tolerate and, then, for that rate, get the lowest rate possible for missed detections.
Thus, would like to use the Neyman-Pearson result. Usually, however, for this context, do not have enough data for that. E.g., typically are quite short on data on the anomalies are trying to detect, and more short as the systems become more reliable.
From the above, we see that necessarily and inescapably such monitoring is some continually applied statistical hypothesis tests.
Apparently in practice, false alarm rate is not known and not reasonable to select or even to adjust.
Then we see that we need tests are both multi-dimensional and distribution-free.
A special case of high interest is zero-day problems, that is, detecting problems never seen before. So, this is behavioral monitoring -- any behavior sufficiently unusual is regarded as an anomaly, that is, evidence of something wrong.
From all I can see, so far the monitoring community has yet to take these points to heart.
The OP's remarks on thresholds are on target: Thresholds have been the old, lame, weak workhorse of monitoring far too long.
If anyone is actually seriously interested in this subject, let me know. Some years ago I concluded that no one was interested!
It's a spectrum to me. We're way behind the curve on monitoring and the "state of the art" in, anywhere but cutting edge shops, is woeful. I'd love folks to be able to anomaly detection easily and simply but the technology and tools aren't quite there yet. I am just hoping to get folks to advance their environments a little way forward.
Another thought. Two really:
- Lead us into more about the why before showing the price. Really tell the story of the pain that so many of us can identify with.
- Use a standard three teir pricing box style, with the values of each above the price. Something like: https://planscope.io/pricing/
If you haven't read one of the authors book's before (he's released titles on both Logstash and Docker.), he puts out really quality material and he seems to update them when new releases of the subject come out. This looks like another great release. Kudos James.
Just bought the book, and I didn't even realize it was James Turnbull, that guy is extremely advanced in monitoring. I've seen many of his talks online, and no one speaks about monitoring in a more rational way that I've seen.
That's mostly correct. I have a chapter on adding instrumentation to applications with examples in Ruby/Rails and Clojure that can easily be adapted to other languages and frameworks. I also cover adding structured logging to your applications.
Love your site design. it's clean and easy to read, great presentation. Do you just whip up a new design from scratch each time you create a book or did you hire someone or purchase off the shelf? Curious as these things take me forever..
I would argue that its not and that it's a mess, starting with that URL. This is yet another fork of an outdated monitoring system(how many Nagios forks are there?) The architecture diagram made my head hurt, how many moving parts is that? I counted 8 separate components. There are much more modern monitoring systems these days such as Prometheus, Bosun and Riemann.
Thanks - I considered that but I don't like to put barriers in people's way - especially for free content. I don't like to be too marketing-esque. Just not in my nature. :) A couple of thousand folks signed up to the mailing list using the current approach, which I'm pretty happy with.
[+] [-] thinkersilver|9 years ago|reply
I hope this can be seen as constructive criticism but I do have a few comments on the layout and content.
1. Free chapter - perhaps chapter one or two may have been a better choice. I'd like to know the philosophy behind monitoring expressed in the book before diving into the details.
2. capacity planning - A chapter on this would have been great. Most teams I've worked with have struggled with sizing, planning resources required and archiving strategies with their monitoring solutions.
3. Monitoring strategies for different levels in the stack - where do I start, what should my short term goals be and so on.
4. naming of some of the chapters are too focused on the technology- for example the chapter on logstash could have been renamed to something to do with application logging or log scrapping.
5. visualisation and communication of results - there could have been a chapter on dashboard and reporting. This is a common issue with teams trying to understand how to do this.
This was written in a bit of a hurry but I hope my points came through.
[+] [-] jamtur01|9 years ago|reply
1. I decided not to do this because my experience is that people like to do something practical first. I've had a huge response to that chapter - lots of folks have gotten into Riemann that had previously been stuck. That alone is a solid +++ for me.
2. Each chapter contains some discussion of capacity planning for specific tools, where relevant.
3. The capstone chapters (11-13) discusses this, as do the chapters covering logging and application instrumentation.
4. Thanks - I'll consider that.
5. I discuss in various chapters visualization but I've found that most folks have very different needs and desires. So I focussed on discussing what to show in small segments as well as some visual design discussion rather than a specific chapter on dashboarding/reporting. Hard choice but a 750 page book needs to stop somewhere. :)
Thanks for taking the time to comment - it's awesome when folks share their thoughts!
[+] [-] _asummers|9 years ago|reply
http://the-cloud-book.com
[+] [-] cothomps|9 years ago|reply
[+] [-] graycat|9 years ago|reply
The challenge is to find means of monitoring that will permit selecting the rate of false alarms are willing to tolerate and, then, for that rate, get the lowest rate possible for missed detections.
Thus, would like to use the Neyman-Pearson result. Usually, however, for this context, do not have enough data for that. E.g., typically are quite short on data on the anomalies are trying to detect, and more short as the systems become more reliable.
From the above, we see that necessarily and inescapably such monitoring is some continually applied statistical hypothesis tests.
Apparently in practice, false alarm rate is not known and not reasonable to select or even to adjust.
Then we see that we need tests are both multi-dimensional and distribution-free.
A special case of high interest is zero-day problems, that is, detecting problems never seen before. So, this is behavioral monitoring -- any behavior sufficiently unusual is regarded as an anomaly, that is, evidence of something wrong.
From all I can see, so far the monitoring community has yet to take these points to heart.
The OP's remarks on thresholds are on target: Thresholds have been the old, lame, weak workhorse of monitoring far too long.
If anyone is actually seriously interested in this subject, let me know. Some years ago I concluded that no one was interested!
[+] [-] jamtur01|9 years ago|reply
[+] [-] bdavis56|9 years ago|reply
[+] [-] pramodbiligiri|9 years ago|reply
[+] [-] ginkgotree|9 years ago|reply
[+] [-] jamtur01|9 years ago|reply
[+] [-] jamtur01|9 years ago|reply
[+] [-] bogomipz|9 years ago|reply
[+] [-] coredog64|9 years ago|reply
[+] [-] drauh|9 years ago|reply
[+] [-] jamtur01|9 years ago|reply
[+] [-] AndyNemmity|9 years ago|reply
This is Hacker News at it's best in my view.
[+] [-] guidedlight|9 years ago|reply
[+] [-] jamtur01|9 years ago|reply
[+] [-] SonicSoul|9 years ago|reply
[+] [-] jamtur01|9 years ago|reply
[+] [-] reptation|9 years ago|reply
[+] [-] bogomipz|9 years ago|reply
[+] [-] alamaison|9 years ago|reply
[+] [-] jamtur01|9 years ago|reply
[+] [-] Omnipresent|9 years ago|reply
[+] [-] jamtur01|9 years ago|reply
[+] [-] ginkgotree|9 years ago|reply
[+] [-] jamtur01|9 years ago|reply
[+] [-] Thriptic|9 years ago|reply