Linux Performance Analysis (2015)

My first command is always 'w'. And I always urge young engineers to do the same.

There is no shorter command to show uptime, load averages (1/5/15 minutes), logged in users. Essential for quick system health checks!

mmh0000|7 months ago

It should also be mentioned, Linux Load Average is a complex beast[1]. However, a general rule of thumb that works for most environments is:

You always want the load average to be less than the total number of CPU cores. If higher, you're likely experiencing a lot of waits and context switching.

[1] https://www.brendangregg.com/blog/2017-08-08/linux-load-aver...

chasil|7 months ago

Glances is nice. I think it is a clone of HP-UX Glance.

https://nicolargo.github.io/glances/

I have also hacked basic top to add database login details to server processes.

Propelloni|7 months ago

Me too! So much so that I add it to my .bashrc everywhere.

__turbobrew__|7 months ago

If you like this post, I would recommend “BPF Performance Tools” and “Systems Performance: Enterprise and the Cloud” by Brenden Gregg.

I have pulled out a few miracles using these tools (identifying kernel bottlenecks or profiling programs using ebpf) and it has been well worth the investment to read through the books.

yankcrime|7 months ago

Agreed, highly recommended reading. A slightly more up-to-date post of his which recommends tools in such situations is: https://www.brendangregg.com/blog/2024-03-24/linux-crisis-to...

wcunning|7 months ago

Literally did miracles at my last job with the first book and that got me my current job, where I also did some impressive proving which libraries had what performance with it again... Seriously valuable stuff.

sour-taste|7 months ago

Almost all of these have been replaced for me with below: https://developers.facebook.com/blog/post/2021/09/21/below-t...

It is excellent and contains most things you could need. Downside is that it isn't yet a standard tool so you need to get it installed across your fleet

benreesman|7 months ago

Oh man nostalgia city. I vividly remember meeting atop time travel debugging at 3am in Menlo Park in 2012, wild times.

louwrentius|7 months ago

The iostat command has always been important to observe HDD/SDD latency numbers.

Especially SSDs are treated like magic storage devices with infinite IOPS at Planck-scale latency.

Until you discover that SSDs that can do 10GB/s don't do nearly so well (not even close) when you access them in a single thread with random IOPS, with queue depth of 1.

wcunning|7 months ago

That's where you start down the eBPF rabbit hole with bcc/biolatency and other block device histogram tools. Further, the cache hit rate and block size behavior of the SSD/NVME drive can really affect things if, say, your autonomous vehicle logging service uses MCAP with a chunk size much smaller than a drive block... Ask me how I know

mortar|7 months ago

2015

Previous discussions: https://news.ycombinator.com/item?id=10654681 https://news.ycombinator.com/item?id=10652076

microtonal|7 months ago

Yeah, I skipped the date and then saw Linux 3.13 in the examples.

tomhow|7 months ago

Previously:

Linux Performance Analysis in 60,000 Milliseconds - https://news.ycombinator.com/item?id=10652076 - Nov 2015 (11 comments)

Linux Performance Analysis - https://news.ycombinator.com/item?id=10654681 - Dec 2015 (82 comments)

Linux Performance Analysis in 60k Milliseconds (2015) [pdf] - https://news.ycombinator.com/item?id=44070741 - May 2025 (1 comment)

5pl1n73r|7 months ago

After this article was written, `free -m` on many systems started to have an "available" column that shows the sum of reclaimable and free memory. It's nicer than the "-/+" section shown in this old article.

  $ free -m
                 total        used        free      shared  buff/cache   available
  Mem:            3915        2116        1288          41         769        1799
  Swap:            974           0         974

whalesalad|7 months ago

I quite like `iotop` as an alternative to iostat. https://linux.die.net/man/1/iotop

fduran|7 months ago

shameless plug: you can practice this in a free VM https://docs.sadservers.com/docs/scenario-guides/practical-l... (there's a typo there to keep you on your feet)

CodeCompost|7 months ago

> At Netflix we have a massive EC2 Linux cloud

Wait a minute. I thought Netflix famously ran FreeBSD.

craftkiller|7 months ago

My understanding was their CDN ran on FreeBSD, but not their API servers. But I don't work for Netflix.

drewg123|7 months ago

The CDN runs FreeBSD. Linux is used for nearly everything else.

unknown|7 months ago

[deleted]

ImPostingOnHN|7 months ago

Maybe I missed it, but checking available disk space is often a good step in diagnosing misbehaving systems.

rkachowski|7 months ago

it's 10 years later - what's the 60 second equivalent in 2025?

wcunning|7 months ago

@yankcrime posted it above: https://www.brendangregg.com/blog/2024-03-24/linux-crisis-to...

emmelaich|7 months ago

Nice list. sar/sysstat is underrated imho.

mmh0000|7 months ago

Oh man. There's a blast from the past.

Today, you'd want something like:

Prometheus + Node Exporter [1]

[1] https://github.com/prometheus/node_exporter

unknown|7 months ago

[deleted]

appleaday1|7 months ago

he forgot about rusttop

AnyTimeTraveler|7 months ago

I'm pretty sure that that didn't exist in 2015 ;)

40 comments