I'm not part of the Java world at all, but I'm starting to think that Azul Systems is one of the few groups of people who know what they're doing with regard to Linux performance and the user/kernel boundary. I recently watched some talks by Cliff Click[1] that were extremely informative, and from what I understand about their proposed kernel patches, they seem like important improvements.
This report by Gil Tene (their CTO according to Wikipedia) lends more support to that theory.
They absolutely do! If you purchase their Zing JVM, it is generally because of the very low GC pauses it brings. And after that occurs, you quickly start to notice what other things are going on on your system to cause unwanted pauses in your processes.
In order to effectively sell the JVM, they need to help you understand and reduce the non-GC pauses so you can realize the full value of your investment.
I've been trying to track down randomly latency (> 1000ms) in the network stack between the socket buffer and the client for about 3 months now... All of the stack traces showed the app was stuck in futex_wait, but since it looked identical to an idle server, I'd convinced myself epoll_wait was at fault... All the sudden I'm wondering otherwise. We're not on Haswell though; it's not clear to me if the bug would affect other processors or not - can it?
Mostly, I'm confused as to how this has only bitten people on Haswell - did pre-Haswell just enforce a MB invisibly there for some reason, or did Haswell explicitly change some semantics?
Also, an interesting note is that the commit references this deadlocking on ARM64, so I'm guessing this probably broke on non-x86 architectures in strange ways unless I'm really missing something...
You have to remember that the absence of a barrier does not automatically causes concurrency failures in a fail-fast manner. The barriers just provide guarantees. Your code might end up working (either always or in 99.999999% of all observed operations) just by chance. So it might simply be more visible on haswell than on other systems because it behaves differently or is more aggressive about exploting non-barriered operations.
From the mailing list:
> In our case it's reproducing on 10 core haswells only which are different than 8 cores (dual vs single ring bus and more cache coherency options). It's probably a probability matter. [...]
> Pinning the JVM to a single cpu reduces the probability of occurrence drastically (from a few times a day to weeks) so I'm guessing latency distributions may have an effect.
I'm wondering the same thing. The one thing I know that changed in Haswell is that the transactional memory instructions were found to be broken, but I assume those aren't the issue here...
> For some reason, people seem to not have noticed this or raised the alarm. We certainly haven't seen much "INSTALL PATCHES NOW" fear mongering. And we really need it, so I'm hoping this posting will start a panic.
Should we also alert the President? Maybe OP is only talking regarding to the ml he has posted on but we're out of context here on HN? Only affected systems in production seems to be RHEL 6.6 on Haswell.
Ubuntu 14.04/Debian 8: have the fix for a long time [0] [1]
Ubuntu 12.04/Debian 7: was never affected [3] [2]. Newer enablement stack kernels for Ubuntu has the same fix as [1].
RHEL 7: OP only talks about 6.6 so I assume either it doesn't have the regression backported to it or it already has the fix
My eyes jumped straight to "%$^!" and I wondered what a shell expansion had to do with a futex_wait bug. Then I briefly tried to parse it and only then read the sentence. I wondered how many others this happened to.
It would be nice to know which distros are affected by this bug. Particularly, i had unexplainable JVM lockups on RHEL 5.11 this week after it was upgraded.
Seeing that 6.6 and 5.11 both were released in 2014 with a 2.6.x kernel, i can imagine this bug also applying to RHEL5..
Also, does anyone know if there is some RHEL errata about this bug?
edit: I just looked at the RedHat applied patches for RHEL5.11 linux 2.6.18-398 and this bug was also introduced in the RHEL5.11 series (not sure if a subsequent kernel version fixes this)
Can anyone recommend a way to check a linux server for whether it's running on a Haswell CPU ?
I'm guessing perhaps checking /proc/cpuinfo for the XEON version v3, or looking for flags 'hle|rtm|tsx' would work - but something more definitive would help with mass-auditing.
Based on the Linux kernel range of 3.14 to 3.18 inclusive, and this[1] list of Ubuntu kernel versions, I believe only 14.10 (Utopic Unicorn) would even be affected.
This is why programmability is important. This is why being able to achieve performance of relaxed memory models with a more intuitive SC memory model should be a top objective for Intel and architecture researchers..
It is difficult to design meaningful unit tests for preemptively multitasking, protected memory operating system kernels. I don't know that it's actually impossible, but difficult.
While there are many advantages to unit testing, kernels are typically tested from userspace.
Some tests are laughably simple. I panicked the OS X kernel a while back with a shell script that repeatedly loaded and unloaded my kernel extension. Only a minute or two was required for the panic.
Apple fixed the panic but never told me how they screwed up.
EDIT: Of significant concern is how the kernel deals with the electrical circuitry. While the kernel is implemented in software, the reason we even have kernels, is so that end-user code doesn't have to understand much about physics.
AMCC - since acquired by LSI - sold some high-end RAID Host Bus Adapters. We had quite a significant problem with motherboard support. We had to test our cards on a whole bunch of different motherboards as well as PCI expansion chassis.
One might protest that "PCI is a standard!" but what we have is what we can buy at Microcenter. :-/
While not all of the kernel is concerned with physical hardware, much of it is. It's not really possible to write unit tests for the parts that have to deal with edge cases in electrical circuitry.
This is a race triggered by a particular reording of memory accesses as seen by different cores. It's the kind of thing that doesn't necessarily show up in a unit test anyway.
In case you're not trolling: not really, not officially.
Unpopular features on less common architectures are frequently broken for large stretches of time, and go unnoticed until someone complains. Open source really exemplifies the squeaky wheel getting the grease, which is kind of sad.
Places where Linux is popular undoubtedly have their own internal private test suites, especially for features less popular on bleeding edge kernels (eg S390 arch support or Infiniband)
It would be hard to get any sort of good coverage with unit tests, too, but that shouldn't be a reason to avoid trying.
Looks to be more than just Haswell. I was wondering this too, and just noticed this comment on the (fix) patch: "the problem (user space deadlocks) can be seen with
Android bionic's mutex implementation on an arm64 multi-cluster system."
The problem is not that there is not an enforced default case, but that it was not in the coding standards. I always have a default case, but this is in my coding standards.
[+] [-] btrask|11 years ago|reply
This report by Gil Tene (their CTO according to Wikipedia) lends more support to that theory.
[1] https://www.youtube.com/watch?v=uL2D3qzHtqY
[+] [-] hga|11 years ago|reply
A shame, since those would allow some really sweet things for garbage collectors, which evidently isn't a goal.
[+] [-] osi|11 years ago|reply
In order to effectively sell the JVM, they need to help you understand and reduce the non-GC pauses so you can realize the full value of your investment.
(full disclosure: happy Zing customer)
[+] [-] matheweis|11 years ago|reply
[+] [-] paulmd|11 years ago|reply
[+] [-] zobzu|11 years ago|reply
Yeah well its a DOS triggered by pretty particular conditions. While it's bad, starting a panic for that reminds me of the boy who cried wolf story.
But then again feel free to slap a fancy site and a codeword to it, all the cool kids do it anyway :)
[+] [-] rincebrain|11 years ago|reply
Also, an interesting note is that the commit references this deadlocking on ARM64, so I'm guessing this probably broke on non-x86 architectures in strange ways unless I'm really missing something...
[+] [-] the8472|11 years ago|reply
From the mailing list:
> In our case it's reproducing on 10 core haswells only which are different than 8 cores (dual vs single ring bus and more cache coherency options). It's probably a probability matter. [...]
> Pinning the JVM to a single cpu reduces the probability of occurrence drastically (from a few times a day to weeks) so I'm guessing latency distributions may have an effect.
[+] [-] wfunction|11 years ago|reply
[+] [-] whoopdedo|11 years ago|reply
[+] [-] nickysielicki|11 years ago|reply
[+] [-] kasabali|11 years ago|reply
Should we also alert the President? Maybe OP is only talking regarding to the ml he has posted on but we're out of context here on HN? Only affected systems in production seems to be RHEL 6.6 on Haswell.
Ubuntu 14.04/Debian 8: have the fix for a long time [0] [1]
Ubuntu 12.04/Debian 7: was never affected [3] [2]. Newer enablement stack kernels for Ubuntu has the same fix as [1].
RHEL 7: OP only talks about 6.6 so I assume either it doesn't have the regression backported to it or it already has the fix
SLES: don't know, don't care.
[0] http://kernel.ubuntu.com/git/ubuntu/linux.git/log/?showmsg=1...
[1] http://kernel.ubuntu.com/git/ubuntu/linux.git/log/?showmsg=1...
[2] https://git.kernel.org/cgit/linux/kernel/git/stable/linux-st...
[3] http://kernel.ubuntu.com/git/ubuntu/ubuntu-precise.git/tree/...
[+] [-] buster|11 years ago|reply
Also, i don't believe your queries are a sufficient check at all.
You can clearly find the missing default case in http://kernel.ubuntu.com/git/ubuntu/ubuntu-precise.git/tree/... . So i guess that means ubuntu precise is/was also affected?
Same in your other link https://git.kernel.org/cgit/linux/kernel/git/stable/linux-st... .
Please check your facts next time.
[+] [-] planckscnst|11 years ago|reply
[+] [-] AceJohnny2|11 years ago|reply
[+] [-] olalonde|11 years ago|reply
30 lines commit message for 2 lines of code. Kernel developers sure are disciplined.
[+] [-] random3|11 years ago|reply
[+] [-] buster|11 years ago|reply
Also, does anyone know if there is some RHEL errata about this bug?
edit: I just looked at the RedHat applied patches for RHEL5.11 linux 2.6.18-398 and this bug was also introduced in the RHEL5.11 series (not sure if a subsequent kernel version fixes this)
[+] [-] minaguib|11 years ago|reply
I'm guessing perhaps checking /proc/cpuinfo for the XEON version v3, or looking for flags 'hle|rtm|tsx' would work - but something more definitive would help with mass-auditing.
[+] [-] random3|11 years ago|reply
[+] [-] azinman2|11 years ago|reply
[+] [-] kasabali|11 years ago|reply
[1] http://kernel.ubuntu.com/git/ubuntu/ubuntu-trusty.git/log/?q...
[2] http://kernel.ubuntu.com/git/ubuntu/ubuntu-utopic.git/commit...
[+] [-] scott_karana|11 years ago|reply
1 http://askubuntu.com/questions/517136/list-of-ubuntu-version...
EDIT: Turns out 14.04.02 LTS can also optionally use 3.16: http://www.omgubuntu.co.uk/2015/02/ubuntu-14-04-2-lts-releas...
[+] [-] yshalabi|11 years ago|reply
[+] [-] ape4|11 years ago|reply
[+] [-] MichaelCrawford|11 years ago|reply
While there are many advantages to unit testing, kernels are typically tested from userspace.
Some tests are laughably simple. I panicked the OS X kernel a while back with a shell script that repeatedly loaded and unloaded my kernel extension. Only a minute or two was required for the panic.
Apple fixed the panic but never told me how they screwed up.
EDIT: Of significant concern is how the kernel deals with the electrical circuitry. While the kernel is implemented in software, the reason we even have kernels, is so that end-user code doesn't have to understand much about physics.
AMCC - since acquired by LSI - sold some high-end RAID Host Bus Adapters. We had quite a significant problem with motherboard support. We had to test our cards on a whole bunch of different motherboards as well as PCI expansion chassis.
One might protest that "PCI is a standard!" but what we have is what we can buy at Microcenter. :-/
While not all of the kernel is concerned with physical hardware, much of it is. It's not really possible to write unit tests for the parts that have to deal with edge cases in electrical circuitry.
[+] [-] caf|11 years ago|reply
[+] [-] fragmede|11 years ago|reply
Unpopular features on less common architectures are frequently broken for large stretches of time, and go unnoticed until someone complains. Open source really exemplifies the squeaky wheel getting the grease, which is kind of sad.
Places where Linux is popular undoubtedly have their own internal private test suites, especially for features less popular on bleeding edge kernels (eg S390 arch support or Infiniband)
It would be hard to get any sort of good coverage with unit tests, too, but that shouldn't be a reason to avoid trying.
[+] [-] nitrogen|11 years ago|reply
[+] [-] koverstreet|11 years ago|reply
[+] [-] unknown|11 years ago|reply
[deleted]
[+] [-] userbinator|11 years ago|reply
[+] [-] matheweis|11 years ago|reply
[+] [-] mikerichards|11 years ago|reply
[+] [-] asveikau|11 years ago|reply
Much more important what the code in that new default block is doing. It's a memory fence.
[+] [-] cremno|11 years ago|reply
[+] [-] danieltillett|11 years ago|reply