(no title)
jclay | 1 year ago
This was getting very frustrating, at various points I tried every other option online (including restoring bios to Intel Baseline settings), etc.
I came across Keean's investigations into the matter on the Intel forums:
> I think there is an easy solution for Intel, and that is to limit p-cores with both hyper-threads busy to 5.8GHz and allow cores with only one hyper-thread active to boost up to 5.9/6.2 they would then have a chip that matched advertised multi-core and single-thread performance, and would be stable without any specific power limits.
> I still think the real reason for this problem is that hyper-threading creates a hot-spot somewhere in the address arithmetic part of the core, and this was missed in the design of the chip. Had a thermal sensor been placed there the chip could throttle back the core ratio to remain stable automatically, or perhaps the transistors needed to be bigger for higher current - not sure that would solve the heat problem. Ultimately an extra pipeline stage might be needed, and this would be a problem, because it would slow down when only one hyper-thread is in use too. I wonder if this has something to do with why intel are getting rid hyper-threading in 15th gen?
From: https://community.intel.com/t5/Processors/14900ks-unstable/m...
Based on this, I set a P-Core limit to 5.8 in my bios and after several months of daily-use building Chromium I can say this machine is now completely stable.
If you're seeing instability on an i9 14900k or 13900k see the above forum post for more details, and try setting the all-core limit. I've now seen this fix instability in 3+ build machines we use so far.
jclay|1 year ago
There's something unique about the workload of ninja launching a bunch of clang processes that draws this out.
On my machine, a clean build of the llvm-project would consistently fail to complete, so that may be a reasonable workload to A/B test with if you're looking into this.
The user quoted above was running gentoo builds on specific p-cores to test various solutions, ultimately finding that the p-core limit was the only fix that yielded stability.
xuejie|1 year ago
instagib|1 year ago
Rename ninja to oninja, make an executable shell script in the ninja directory ninja.
#!/bin/sh oninja $@ -j4