top | item 43478606

(no title)

AlexClickHouse | 11 months ago

I vaguely remember an old bug in atop, leading to a very unusual consequence.

Atop will do an invalid memory write and crash with a segfault. But this writing is performed on a memory page mapped to a hardware timer. Despite not being able to write into that page, just touching it somehow changes how this hardware timer works. Then, the OS detects that this timer is inaccurate and switches to a different clock source (which you can see in /sys/devices/system/clocksource/clocksource0/current_clocksource). As a result, every call to clock_gettime becomes slower, and the system becomes slower as a whole until it restarts.

In short, a segfault in atop leads to the whole system's performance degradation. But this was found around maybe 7 years ago.

discuss

order

anitil|11 months ago

That is such an interesting bug!