Awesome analysis, I have added it to my favorites list. Around 1990 or so when I was in the kernel group at Sun and a team had just embarked on the multi-processor kernel work that would later result in the 'interrupts as threads'[1] paper. During that time there was an epic thread on email which was something like "What the F*ck does load average mean on an MP system?" (no doubt I have a copy on an unreadable quarter inch tape somewhere :-(). If it helps, the exact same pivot point was identified, which is this, does 'load average' mean the load on the CPU or the load on the system. While there were supporters in the 'system' camp the traditionalists carried the day with "We can't change the definition on existing customers, all of their shell scripts would break!" or something to that effect. Basically, the response was if we were to change it, we would have to call it something different to maintain a commitment to the principle of least surprise. This has never been an issue for Linux :-).
As a "systems" guy I am always interested in how balanced the system is, which is to say that I am always trying to figure out what the slowest part of my system is and insuring that it is within some small epsilon of the other parts. If you do that, then system load is linear with workload almost regardless of task composition. So disk heavy processes load the "system" as much as "compute heavy" processes and "memory heavy" or "network heavy." In an imaginary world you could decompose a system into 'resource units' and then optimize it for a particular workload.
All you old farts (TM) need to get these freaking quarter inch tapes pushed up to some glacier S3 bucket or sum=such bucket before you kick said bucket...
I'm serious. C'mon, don't steal from the future what you actually did in the past to make the present the reality of today!!
I wonder if Illumos / SmartOS and OpenIndiana continued with the principle of least surprise via their ancestry chain, or whether they also moved to a "system" load view like Linux.
It's great that design decisions and thinking from decades past can be dug out and examined by complete strangers.
This is great work in general and excellent historical research.
As an additional historical note: in Unix, load averages were introduced in 3BSD, and at that time they included processes in disk IO wait and other theoretically short-term waits that weren't interruptible. This definition was carried through the BSD series and onward into Unixes derived from them, such as the initial versions of SunOS and Ultrix. At some point (perhaps SunOS 3 to SunOS 4, perhaps later), the SunOS/Solaris definition changed to be purely runable processes.
(I'm not sure what System V derived Unixes such as Irix, HP-UX, and so on did, and their kernel source is not readily available online for spelunking.)
As of early 2016 when I last looked at this, the situation on FreeBSD, OpenBSD, and NetBSD was somewhat tangled. FreeBSD load average only included runable processes, but NetBSD and OpenBSD counted some sleeping or waiting processes as well.
When details of a piece of "open" software are so easily lost I shudder to think about the vast quantity of "closed" software that have had their history lost.
I also kept thinking about how the term "software archaeology" (which I first saw in the 1999 Vernor Vinge novel "A Deepness In the Sky") becomes more and more mainstream each day.
Several years back the company I worked for ended up picking up some work for a client. Every quarter we'd download a huge trove of TIFFs from some source, and then do some image conversion work before shipping transferring them to the customer's infrastructure.
There was a java application that powered the logic side of things, calling out to ImageMagick to do the actual processing and conversion. For whatever reason, after careful benchmarking we settled on a java thread count that happened to get us the peak throughput, but also caused system load average to hit around 400 and keep steady at around that level.
The day that happened, and I could show that no application on the server took a performance hit, was the day that I finally persuaded my boss that load average is an interesting stat, but it's not the be-all and end-all, and that a high load average doesn't necessarily correlate to an actual problem.
I had something similar happen in the past a long time ago on a x86 Solaris 10 mail server. An employee thought it was a good idea to share best quality/full resolution JPEG pictures of his new baby with the whole company. This swamped the mail server (load average was well over 700) while it chugged through delivering a 50mb email to 200+ employees. I forgot what process was the culprit (I think GNU Mailman) but after a couple of hours it finally settled down. I was amazed that could still SSH into it and figure out what happened.
One source of high load average spikes that I've seen in my job is when a process crashes and generates a core dump. While the core dump is being written, all threads in the process are in the TASK_UNINTERRUPTIBLE state even though they are doing absolutely nothing, and as such they all count towards the load average as if they were spinning on on a CPU core. If the total virtual memory of the process is large, say in the multi-GB range, coredumping can take on the order of a minute, and Linux will report an unreasonably high load average if that process had a lot of running threads.
Things like the above scenario make me treat the load average metric with a lot of skepticism. I would much rather use other metrics to infer load.
I rarely recommend alerting monitoring or any kind of action based on load averages or more generally any metric derived from queue lengths. It's trends in high-quantile queue latencies your users (and therefore you should) care about.
Under Better Metrics the author discusses ways of drilling down to find the source of a high load average. I feel like this section should mention `atop`, which is imo a really underrated single-pane-of-glass view into everything your system is doing, now and historically.
If you haven't tried `atop`, give it a go.
This historical analysis in this article though is great, because while Load Average has been an oft discussed and we'll understood topic for a long time, the decisions that got us there are not.
Good article. However, it is missing the reason why load averages include tasks waiting for disc/swap.
One of the things that the load average is sometimes used for is to work out whether it is appropriate to start some more processes running on a system. For example, make has a "-l" option, which prevents more parallel jobs being run while the load is above a supplied number. When a system is thrashing due to insufficient RAM, then the load average will be high, and this option will appropriately prevent more tasks being started which would make the thrashing worse. If the load average was just based on CPU, then it would be low while thrashing, and using that make option could lead to complete system collapse.
This comment makes perfect sense if load is a smooth function. But it is not. It tends to be a step function.
The most recent 2 data points give you is whether the problem is currently getting worse, getting better or steady. The third gives you a sense of whether it has been doing on a while.
This analysis cleared up a mystery for me. I've noticed that when a server app is under heavy load in Linux, the load average goes high if the bottleneck is the CPU or the disk, but the load average goes low if the bottleneck is network resources (like databases or microservice calls). I know why that happens, but it's very unintuitive and it confused me when I was new to Linux. I thought load average would measure the CPU load only. Now I know the historical reasons for measuring system load instead of CPU load.
I kind of like it the way it is since it's handy to be able to distinguish network load from CPU+disk load just by looking at the load average. However, since the load average includes other stuff as well, sometimes I still don't know what the load average really means.
Yes, sorry. I guess proof this is a hobby on some personal hosting that can get overloaded. Try refreshing. Although it's load averages (couldn't resist) aren't that high:
Just because we can deploy services that can take a million RPS doesn't mean we have our side projects / hobby sites in order, hah. I worked in hosting for a long time and I had a personal WordPress site which would get hacked every other month. I literally fixed that problem daily at $JOB, but couldn't be arsed to do something better for myself. It worked, and it was quick and easy. The point was the content.
These days, I'd just use something like Medium or Tumblr. Let someone else worry about hosting it :)
I still managed to read the whole thing. Quite fascinating, really, considering the lengths he went into tracking the ancient (1993) patch that turned CPU load averages into whole system load averages.
Why isn't there one for ram in i3? I read something about how it's hard to gauge ram usage despite htop displaying it as well as inxi in general on Windows you look at task manager there is memory usage.
Worth remembering that essentially the same issue exists at a lower level: the “%Cpu” number as shown by top includes not just the share of time spent actually executing your instructions, but also the share of time waiting on memory access.
When I'd asked Brendan via Twitter for an article on Load Averages in Linux, I hadn't expected such a detailed response. I've worked on a few projects where I've had to show that even though the "load" on the Linux system was low, it was really the steal% and the iowait that were killing performance. I'm sure that from now on, so many system and support engineers will have a good article to reference. Thanks, Brendan!
My company took over production support of an ESB from another company for a client a couple of years ago. The worker nodes had about 100 JVMs running on it and its resting Load Avg was around 30. This on a 2 CPU RHEL vm.
Out of morbid curiosity, I restarted one of the test servers and ran top. Load Avg was in the order of 2200 for about 3 hours.
The worst part was that the guys we took it over from didn't even think it was a problem.
Page swapping seems like it makes a lot of sense to include in the load average. Disk I/O seems like something more towards the opposite end of the spectrum, though TASK_KILLABLE (https://lwn.net/Articles/288056/) presumably mitigates this where used.
What we need is a systems model that allows us to assess the overall health of a server in a single metric. Indicators of something under strain will reflect itself in the metric and draw our attention for further drilldown and analysis. "Load Average" is the metric we (the systems community) have generally been using for this. Unfortunately it appears that the model it is based on may be rather dated and may have flaws which mean we will miss, or misinterpret system health status by relying on that number. So the million dollar question is - starting from scratch, how can we design a model of our system that yields an reliable system health indicator metric?
OT: what could cause a system to have a load of 1 when idle?
I have one (unimportant) Linux system that idles with a load of exactly 1. The issue persists through reboots. It is a KVM virtual machine and qemu confirms nothing is going on in the background.
I thought that including disk wait in the load average was a common Unix feature. Sadly I can't go spelunking through the archives right now, but it would be interesting to see what Solaris and BSD do, for comparison with systems a little bit closer to Linux than TENEX :-)
[+] [-] ChuckMcM|8 years ago|reply
As a "systems" guy I am always interested in how balanced the system is, which is to say that I am always trying to figure out what the slowest part of my system is and insuring that it is within some small epsilon of the other parts. If you do that, then system load is linear with workload almost regardless of task composition. So disk heavy processes load the "system" as much as "compute heavy" processes and "memory heavy" or "network heavy." In an imaginary world you could decompose a system into 'resource units' and then optimize it for a particular workload.
[1] http://dl.acm.org/citation.cfm?id=202217
[+] [-] samstave|8 years ago|reply
All you old farts (TM) need to get these freaking quarter inch tapes pushed up to some glacier S3 bucket or sum=such bucket before you kick said bucket...
I'm serious. C'mon, don't steal from the future what you actually did in the past to make the present the reality of today!!
[+] [-] otoburb|8 years ago|reply
It's great that design decisions and thinking from decades past can be dug out and examined by complete strangers.
[+] [-] siebenmann|8 years ago|reply
As an additional historical note: in Unix, load averages were introduced in 3BSD, and at that time they included processes in disk IO wait and other theoretically short-term waits that weren't interruptible. This definition was carried through the BSD series and onward into Unixes derived from them, such as the initial versions of SunOS and Ultrix. At some point (perhaps SunOS 3 to SunOS 4, perhaps later), the SunOS/Solaris definition changed to be purely runable processes.
(I'm not sure what System V derived Unixes such as Irix, HP-UX, and so on did, and their kernel source is not readily available online for spelunking.)
As of early 2016 when I last looked at this, the situation on FreeBSD, OpenBSD, and NetBSD was somewhat tangled. FreeBSD load average only included runable processes, but NetBSD and OpenBSD counted some sleeping or waiting processes as well.
[+] [-] EvanAnderson|8 years ago|reply
I also kept thinking about how the term "software archaeology" (which I first saw in the 1999 Vernor Vinge novel "A Deepness In the Sky") becomes more and more mainstream each day.
[+] [-] brendangregg|8 years ago|reply
[+] [-] ams6110|8 years ago|reply
[+] [-] Twirrim|8 years ago|reply
There was a java application that powered the logic side of things, calling out to ImageMagick to do the actual processing and conversion. For whatever reason, after careful benchmarking we settled on a java thread count that happened to get us the peak throughput, but also caused system load average to hit around 400 and keep steady at around that level.
The day that happened, and I could show that no application on the server took a performance hit, was the day that I finally persuaded my boss that load average is an interesting stat, but it's not the be-all and end-all, and that a high load average doesn't necessarily correlate to an actual problem.
[+] [-] Bluecobra|8 years ago|reply
[+] [-] sreque|8 years ago|reply
Things like the above scenario make me treat the load average metric with a lot of skepticism. I would much rather use other metrics to infer load.
[+] [-] lotyrin|8 years ago|reply
[+] [-] saalweachter|8 years ago|reply
[+] [-] mentat|8 years ago|reply
[+] [-] simonjgreen|8 years ago|reply
If you haven't tried `atop`, give it a go.
This historical analysis in this article though is great, because while Load Average has been an oft discussed and we'll understood topic for a long time, the decisions that got us there are not.
[+] [-] mnw21cam|8 years ago|reply
One of the things that the load average is sometimes used for is to work out whether it is appropriate to start some more processes running on a system. For example, make has a "-l" option, which prevents more parallel jobs being run while the load is above a supplied number. When a system is thrashing due to insufficient RAM, then the load average will be high, and this option will appropriately prevent more tasks being started which would make the thrashing worse. If the load average was just based on CPU, then it would be low while thrashing, and using that make option could lead to complete system collapse.
[+] [-] Filligree|8 years ago|reply
[+] [-] Florin_Andrei|8 years ago|reply
That could be accomplished with a set of two.
A set of three could in theory give you acceleration.
[+] [-] btilly|8 years ago|reply
The most recent 2 data points give you is whether the problem is currently getting worse, getting better or steady. The third gives you a sense of whether it has been doing on a while.
[+] [-] BayAreaSmayArea|8 years ago|reply
[+] [-] hathawsh|8 years ago|reply
I kind of like it the way it is since it's handy to be able to distinguish network load from CPU+disk load just by looking at the load average. However, since the load average includes other stuff as well, sometimes I still don't know what the load average really means.
[+] [-] ty_a|8 years ago|reply
[+] [-] brendangregg|8 years ago|reply
[+] [-] seanp2k2|8 years ago|reply
Just because we can deploy services that can take a million RPS doesn't mean we have our side projects / hobby sites in order, hah. I worked in hosting for a long time and I had a personal WordPress site which would get hacked every other month. I literally fixed that problem daily at $JOB, but couldn't be arsed to do something better for myself. It worked, and it was quick and easy. The point was the content.
These days, I'd just use something like Medium or Tumblr. Let someone else worry about hosting it :)
[+] [-] rcarmo|8 years ago|reply
[+] [-] unknown|8 years ago|reply
[deleted]
[+] [-] ge96|8 years ago|reply
[+] [-] stephengillie|8 years ago|reply
Here's an article on gathering this data on Windows with Powershell:
https://www.petri.com/display-memory-usage-powershell
[+] [-] faragon|8 years ago|reply
[+] [-] vfaronov|8 years ago|reply
As explained by the same author: http://www.brendangregg.com/blog/2017-05-09/cpu-utilization-...
[+] [-] solarengineer|8 years ago|reply
[+] [-] sytringy05|8 years ago|reply
Out of morbid curiosity, I restarted one of the test servers and ran top. Load Avg was in the order of 2200 for about 3 hours.
The worst part was that the guys we took it over from didn't even think it was a problem.
[+] [-] mnarayan01|8 years ago|reply
[+] [-] rotten|8 years ago|reply
[+] [-] mobilethrow|8 years ago|reply
I have one (unimportant) Linux system that idles with a load of exactly 1. The issue persists through reboots. It is a KVM virtual machine and qemu confirms nothing is going on in the background.
Any ideas how to find out what's causing it?
[+] [-] fanf2|8 years ago|reply
[+] [-] brendangregg|8 years ago|reply
[+] [-] unknown|8 years ago|reply
[deleted]
[+] [-] swinglock|8 years ago|reply