(no title)
cyclonereef | 4 months ago
For CPU, check for CPU IOWait For memory, check for Memory swap-in rate For disk, check for latency or queue depth For network, check for dropped packets
All you want to check at an infrastructure layer is whether there is a bottleneck and what that bottleneck is. Whether an application is using 10% or 99% of available memory is moot if the application isn't impacted by it. The above metrics are indicators (but not always proof) that a resource is being bottlenecked and needs investigation.
Monitor further up the application stack, check for error code rates over time, implement tracing to the extent that you can for core user journeys, ignore infrastructure-level monitoring until you have no choice
No comments yet.