Linux Pressure Stall Information (PSI) by Example

[+] tyingq|6 years ago|reply

Wow. This would have totally changed my job of being a Unix sysadmin in the 90's.

Funny how such a basic capability doesn't appear as a batteries included thing until 40+ years after Unix appears. Lots of futzing around with uptime, ps, top, iostat, vmstat, ntop, sar, lsof, free, etc. All of that replaced with a simpler tool to get the high level answer.

That first level filter is the key to narrow in your troubleshooting. Bravo. I bet Adrian Cockroft (Sun, Netflix, now AMZN) approves.

[+] akira2501|6 years ago|reply

> doesn't appear as a batteries included thing until 40+ years after Unix appears

64kB of RAM in 1980: $405. $1,335 if you adjust for inflation.

64GB of RAM in 2020: $175.

In my mind, that has the most to do with it.

[+] heavenlyblue|6 years ago|reply

>> Funny how such a basic capability doesn't appear as a batteries included thing until 40+ years after Unix appears.

Is there anyone on HN who knows why it didn’t appear earlier?

Lack of need? This doesn’t seem like a hard thing to implement - am I missing something in terms of implementation complexity?

[+] navinsylvester|6 years ago|reply

Ubercool.

Wasn't aware of PSI - thanks to the author. The load avg alerts are mostly late to our liking but tooling around this should really help.

[+] the8472|6 years ago|reply

Another advantage of PSI is that there also are per-cgroup monitors. Unlike load indicators which are system-global and thus pointless when your cgroup is limited by quotas.

[+] danw1979|6 years ago|reply

This is going to cone in very useful indeed.

It's been around since kernel 4.2 according to the article. Do any of the mainstream container orchestration tools use this data, I wonder...

[+] kees99|6 years ago|reply

In-kernel resource-specific counters are great, but those are limited to 10/60/300 seconds averages.

For historic resource utilization, sar/sadc tool is still a go-to.

[+] shuss|6 years ago|reply

kees99, you should check the "total" fields. They have information in microsecond resolution.

[+] kccqzy|6 years ago|reply

If I have a cron job and reads the pressure file and simply stores its contents with a time stamp for later analysis, would that be enough to determine historic resource utilization?

[+] magoon|6 years ago|reply

Major improvement over loadavg for troubleshooting, given the difficulty in identifying when a system is io-bound.

[+] rwha|6 years ago|reply

This is an excellent write-up. The code example for polling events is going to keep me entertained long enough to get in trouble. Well done shuss!

[+] markandrewj|6 years ago|reply

This is pretty useful and cool.

[+] OlympusKnight|6 years ago|reply

[deleted]

18 comments