top | item 39582116

Better PC cooling with Python and Grafana

258 points| naggie | 2 years ago |calbryant.uk

152 comments

order

dvdkon|2 years ago

It's surprising that there are no PC fan controllers that would use some variant of PID control with a temperature target. Traditional fan curves are simple, but the result isn't very intuitive.

And many desktop motherboards manage to screw up even the basic fan curve, offering users control of only two points within strict bounds, no piecewise linear curves or hysteresis settings.

I started a fan controller project some 4 years ago and it's now sadly in limbo, waiting for me to solve power filtering issues for the makeshift power supply it grew into. Maybe I should just limit myself to 4-pin fans...

Aurornis|2 years ago

PID control isn’t an easy solution in PC cooling.

CPU temperatures can swing from 40C to 90C and back in a matter of seconds as loads come and go. Modern fan control algorithms have delays and smoothing for this reason.

If you had a steady state load so stable that you could tune around it, setting a fan curve is rather easy and PID is overkill. For normal use where you’re going between idle with the occasional spike and back, trying to PID target a specific temperature doesn’t really give you anything useful and could introduce unnecessary delays in cooling if tuned incorrectly.

selcuka|2 years ago

> It's surprising that there are no PC fan controllers that would use some variant of PID control with a temperature target.

They actually exist (such as ARCTIC F12 TC) but not very common. Separate controllers such as Adafruit EMC2101 are also available.

crest|2 years ago

IIRC because tuning PID controllers for vastly different heating and cooling rates isn't that easy. A CPU can ramp up/down a lot quicker than the cooling system could or should even attempt to react. It's easy to end up with a oscillations the common temperature vs. speed curves (maybe with a hysteresis range) is already annoying enough to tune. Good luck having users come up with good PID terms for their individual combination of parts (CPU, thermal interface, mainboard, case, fans, pump, radiator, etc.).

joshspankit|2 years ago

I’m more surprised there aren’t fan controller standards so the mobo or OS can tell the fan to ramp up with usage instead of relying on temperature probes.

jerrygenser|2 years ago

Another thing I've had great success with on my AMD 7700X is to use AMD Ryzen Master to reduce the TDP.

The chip comes to consumers overclocked by default with a TDP of 105W. I suspect this is the case so that it can beat Intel on benchmarks on "default" settings.

You can set it to "eco mode" and have it run at 65W or 45W TDP. Under load, this only results in like a 5% reduction in performance for a dramatic reduction in electricity consumption, fan speed, heat, etc.

Not sure if the 5500x series chips are overclocked but using eco mode could be a good approach.

LegitShady|2 years ago

Just an FYI AMD Ryzen Master installer contains a dark pattern.

When you first launch it you have to scroll down the disclaimer/license to check off the "I agree to terms and conditions" box (which is obviously unchecked by default).

When you do, it creates the "install button" you can click but the checked box now sits beside text about sending AMD information and if you aren't looking you may assume its still the same text as when you checked the box.

The end effect is to get the user to agree to send data without them noticing.

Aurornis|2 years ago

I’ve tried Eco mode on AMD parts before. The performance drop was a lot more than 5% for heavily multithreaded workloads (compiling), but I could see it being negligible for certain single threaded workloads.

stanac|2 years ago

AMD 7700X is probably factory overclocked 7700, it would be cheaper to just buy 7700. Probably the only difference is that some of the 7700 chips cannot guaranty same clocks and stability when overclocked to 7700X specs.

gambiting|2 years ago

Same with GPUs - reducing the power budget on a 3080Ti down to like 75% reduced the performance by maybe 5-10% but dramatically reduces noise and heat produced.

RachelF|2 years ago

The main problem with the AMD 5000 series is their high idle (non-core) power draw of around 20 to 30W.

AMD have reduced this in their 7000 series.

jtriangle|2 years ago

You can also undervolt most modern AMD parts and actually gain performance

this_user|2 years ago

I had a similar problem to what the article describes with a Ryzen 9. The reason was the the CPU would constantly automatically spike the clock frequency to overclock the CPU. The solution was just to disable this specific feature in the BIOS.

pja|2 years ago

Yes, you can do the same with the 5000 series. The 5600X can be restricted to 45W max tdp & the 5800X to 65W.

mckirk|2 years ago

For people that want to do something similar on Windows, I can wholeheartedly recommend FanControl [1]. It's sadly not open-source, but it works great, and is quite pleasant to interact with.

[1]: https://getfancontrol.com/

a_vanderbilt|2 years ago

It seems to be an unpopular opinion these days, but I have no problems paying for software that is good and comes with a credible promise of ongoing support. FanControl looks pretty cool, and if I can jettison the mental sticky note in favor of them maintaining it I am all ears.

NoPicklez|2 years ago

I came here to recommend this.

In my 15 years of PC building this fan software tops them all. Huge amount of customisation and actually allows you to control fan speeds both under CPU OR GPU heavy loads at once.

This software can do what this article is looking to do, but I am not sure if there is a non-Windows version.

Saris|2 years ago

Fan Control is amazing, it's pretty easy to set up stuff and keeps my PC quiet.

pryelluw|2 years ago

Curious as to why you think it’s sad it is not open source .

haunter|2 years ago

It’s open source

kelvie|2 years ago

I've been using an esp32-based fan controller with esphome and inlined C++ for my waterloop for a while, with a custom (but super simple) temp control algorithm as well:

https://github.com/kelvie/esphome-config/blob/master/pc-fan-...

The main reason for doing this was so that I didn't have to connect the controller to my main PC via USB to program it (I can change the target points via MQTT/wifi).

Playing around with this stuff on my laptop I've also noticed that you have to be careful what calls you make when querying system status on a loop, some things (like weirdly, `powerprofilectl get`) even when called every 5 seconds drains a surprising amount of battery, so in a sense, your tool may start to affect the "idle" power consumption somewhat, and you need to test that.

3abiton|2 years ago

This looks so promising! The only issue is that many MB don't have drivers available in the Linux kernel, makes reading temp control not possible yet.

hnuser123456|2 years ago

Please do some measuring of core temperature response to load before and after re-pasting / upgrading the thermal compound. Something between a large grain of rice and a pea, and I like to clean the CPU and cooler cold plate completely, paste the CPU, then press and smear around the cooler onto the CPU before mounting it, to ensure full surface area coverage with a thin layer of compound.

naggie|2 years ago

As it happens, I did this morning!

I switched from years old Arctic silver 5 to Noctua NT-H1. It resulted in a dramatic difference. 64c loaded vs 84c -- I now suspect I had an air bubble which may invalidate the initial motivation for the work in the first place :-)

majesticmerc|2 years ago

Genuine question: I've known the law of "pea sized amount" for thermal paste for 20 years or so. Does it still hold true for modern (and larger) CPU dies? I haven't upgraded in a long time so genuinely don't know, but also wouldn't want to use outdated knowledge!

belter|2 years ago

Another way to help with cooling..."Energy Efficiency across Programming Languages" - https://greenlab.di.uminho.pt/wp-content/uploads/2017/10/sle...

mrweasel|2 years ago

This is something we've been slightly interested in, as a means to reduce power consumption and the number of servers we need to operate. It's just an insane amount of work trying to rewrite software and finding the bits where produces the most results.

It's not really surprising to see C and C++ doing so well, also positive to see Rust being up there as one of the most energy efficient languages. The one language that keeps surprising me is Pascal. It often in the top 5 - 10 in terms of speed and it also does really well for energy consumption. While I haven't read the article, I could also imagine that it's good in terms of "power spend compiling" due to it's one-pass compiler. What I'm not sure of is if it's all a result of the language design, or if it's because it just had a lot of work put into it over the years by some really smart people. I presume that the tested implementation is Free Pascal.

moffkalast|2 years ago

Now we just need a follow up that takes the average dev time for each language and show which one has the optimal ratio based on these two values. It should be possible to draw a curve based on how long and how often the program will run to plot exactly when it makes sense to switch to something more efficient but also more dev intensive. Sort of in this fashion: https://xkcd.com/1205/

E.g. assembly would have very low energy use by itself, but would require an inordinate amount of human energy (~8.7 MJ/day) invested to get that end result, making it very inefficient when the whole picture is considered. Unless that code runs everywhere constantly for years of course.

PeterStuer|2 years ago

Why is there so much dust on the radiator? Is he sucking hot air through the fins into the pc, or is there soooo much dust inside the case that this is actually the radiator being so saturated with it that it is starting to ooze out?

Either way there seems to be a serious problem with unfiltered air being sucked into the case here. That radiator isn't going to radiate if it is wearing a furr coat.

voidnap|2 years ago

> I’ve played with PBO2 adjustment as I said, but it should be possible to reduce the voltage at the expense of a bit of performance.

Undervolting with PBO2 should not decrease peformance unless you have done something very wrong.

Ryzen CPUs have limit, max temperature, frequency, power, and voltage. The voltage curve follows a frequency curve so higher clock speed requires a higher voltage. A negative offset in PBO reduces the voltage required for a given frequency. It shifts down the voltage curve. Lower voltage typically means less heat and power draw so you can achive a higher clock speed without hitting temperature, power, or voltage maximums.

If your system is stable when undervolting you don't see a loss in performance, generally it improves because you are able to reach higher clock speeds before running into voltage or power limits. The exception is if you induce a rare issue called clock skew at extreme cases that i'm not even sure you can do with PBO2.

js2|2 years ago

I geeked out on something like this for my TrueNAS server setting fan speed based on drive temperature with a PID controller. e.g.

https://github.com/dak180/TrueNAS-Scripts/blob/master/FanCon...

(That's not mine. I think I wrote a variation in Python.)

Then I realized: my server is in the unfinished part of my basement where I can't hear it anyway. Let's just run the fans at 80% speed all the time since that's sufficient to keep the drives cool.

craftoman|2 years ago

Better cooling with Python, Grafana, Prometheus on top of Kubernetes with enchanted AI. Who needs PID these days?

asmor|2 years ago

Didn't pick the right fans for going extra slow. These run at 1700 RPM by default, whereas Noctua has a version - even in the redux line - that runs at 1200 RPM. Though the non-redux like gets even slower - so much that Noctua includes a "low noise adapter" (presumably a resistor).

whywhywhywhy|2 years ago

>I presume the quick temperature rises are specific to modern Ryzen CPUs, perhaps others too. Maybe this is due to more accurate sensors, or even a less-than-ideal thermal interface.

It’s because the cpu is designed to push itself to a thermal limit and have its output performance decided by how you keep it at that limit so it essentially goes full throttle to 90deg then slows down if your cooling can’t keep up which causes the fan spikes.

So I’m told from the research I did.

My latest machine had the same issue but just updating all drivers, setting some auto curves and adding easing for the fan spin up time completely solved it.

pstrateman|2 years ago

This is pretty cool but honestly just setting the pump speed to a constant that's not annoying and setting the fans to a constant that's not annoying is likely to get the same result.

lreeves|2 years ago

What an awesome post! I recently built a new PC based on workstation parts that frustratingly didn't expose the actual CPU temperature except in the IPMI interface; I was starting to hack something together with Netdata and ipmitool but then I saw this post and in a couple hours had a Grafana ingesting the ipmitool sensor output every 10 seconds or so. Thanks!

https://i.imgur.com/KDeHgFY.png

a_vanderbilt|2 years ago

My problem is that I just don't want to think about it. Complex solutions are cool but ultimately it's bandwidth I could use for better things. I like my Chromebook because it requires nearly zero mental effort. I install the package and I use it. Ubuntu was great because you could Google it and get an answer that applies, and ChromeOS is great for the same reason. Linux could be as great as macOS, but it's the fragmentation that kills it.

npteljes|2 years ago

>Linux could be as great as macOS, but it's the fragmentation that kills it.

No, Linux is as valid technologically as the other offerings, fragmented or not. It's just that it doesn't have a mega-corp behind it to push, to make deals with businesses, schools, governments. As soon as Google stepped in with Android and ChromeOS, suddenly it was everywhere.

then4p|2 years ago

I had very similar issues with my nzxt AIO and a Ryzen 5900X. What annoyed me the most is that other vendors offer water temperature based fan control by default. This makes NZXT AIOs basically unfit for modern CPUs and I don't understand why they're recommended and well reviewed.

I switched to a Fractal Celsius and its default setting is to control pump and fan speed by water temp. Problem solved.

NwtnsMthd|2 years ago

Has anyone tried a controller for cooling that is dependent on processor current consumption? Temperature measurements lag, but the current used by the processor is instantaneous and directly converted to heat (P = VI). In theory, it should be possible to reduce temperature spikes.

Arch-TK|2 years ago

With a Noctua NH-D15 and about 30 minutes spent tweaking the motherboard fan curves I was able to get my 5950x to not thermal throttle without producing any noticeable fan noise.

This seems extremely over-engineered and sounds like it could have been solved by using Noctua or similar quiet fans.

ComputerGuru|2 years ago

The author specifically mentions (multiple times) that they are using Noctua fans...

gen_greyface|2 years ago

How did you achieve this? tweaked by hand? i'm in the middle of building a pc and would love to know more.

cinntaile|2 years ago

It says 5959x in the article but it should be 5950x.

hernandipietro|2 years ago

I just use FanControl , and I'm very happy with it. Windows only, unfortunately.

anticodon|2 years ago

On AMD it helps to have pretty recent kernel and amd_pstate=active string in the kernel boot params. I haven't checked the temperatures but I think I've started to hear the fan noise less after enabling it. This option was finally implemented in kernel 6.1 or 6.2. I don't remember the exact version, it happened only 1-1.5 years ago.

naggie|2 years ago

I got annoyed at my fan speeds so decided to experiment with controlling my fans with Python and measuring the results.

jtriangle|2 years ago

You likely have a setting for fan ramp time in your bios, usually in seconds. So setting your pump to always run at 100% and your fans to ramp slowly, say 10 seconds or longer, and using a minimum fan speed that is as high as tolerable would likely work as a no-additional-code solution.

fwip|2 years ago

Looks pretty cool, the self-calibration routine is very nice too.

My only worry is that rapid changes in pump speed might cause extra mechanical stress or wear on the pump, but I have no data to back that up. I've just heard that water pumps sometimes behave in counter-intuitive ways - e.g: sometimes running at a higher speed is better for longevity than a lower one.

voidnap|2 years ago

It would have been neat to include cpu freqency in your charts along side temperature and other things.

derped|2 years ago

[deleted]

AceJohnny2|2 years ago

In conclusion, this is why Systems (Thermal) Control is a profession.

Not dissing on the author's efforts, quite the contrary! But they demonstrate the rabbit hole that is second order effects (like multi-fan beat frequency) and number of parameters to take into account (like "... A solution to [when to enable Passive Mode] may be to detect if the computer is in use (mouse movements)")

u4ik|2 years ago

[deleted]

avidphantasm|2 years ago

Who has time for all this rigamarole. Just use an ARM CPU.

ltbarcly3|2 years ago

Just buy a large AIO water cooler. Replace the included fans with Noctua. That is the full story of how I made my computer silent despite having a CPU with a TDP of over 200W.

asmor|2 years ago

Didn't read the article, did you?

jrockway|2 years ago

I have 8 Noctua fans in my PC and I can tell you they are not silent at 100%. Are they annoying at full power? Not really. But they aren't silent. Thus, fan curves are applied so that when air movement is not required, they aren't running at full speed.