Since kernel 3.3 or so, RAPL is also exposed through `/sys/devices/virtual/powercap/intel-rapl/*/energy_uj` in micro-joules (if not, `modprobe intel_rapl`). So if you want to do a quick power measurement, it can be done using just POSIX sh (root required):
# in milli-watt (1000 = 1W) because shell arithmetic doesn't do floating point
while true; do
LAST_MJ=$MJ
MJ=$(cat /sys/devices/virtual/powercap/intel-rapl/intel-rapl:0/energy_uj)
echo $(((MJ - LAST_MJ) / 1000))
sleep 1
done
Despite the powercap name being intel-rapl, the powercap interface is also available on AMD machines.
For a more detailed reading on several more metrics about the CPU, I think pcm[1] may be a better tool (it's a successor to the Intel Power Gadget the project was forked from). Though, it only works on Intel CPU.
Power profiling is listed as supported on all CPUs though a bunch of features (including memory bandwidth, one that I had wanted) are limited to EPYC CPUs and don't exist in Ryzen or Threadripper.
>Long story short, since last year the AMD Energy sensor information has been limited to root due to the PLATYPUS security vulnerability. HWMON maintainer Guenter Roeck proposed slightly limiting and randomizing the sensor data so it couldn't be used for nefarious purposes but still accurate enough for genuine use-cases and no longer needing to be root-only access. However, AMD engineers didn't like that approach.
>With the hardware monitoring subsystem maintainer not wanting the information to be restricted to root-only and AMD not wanting the limiting/randomization approach, Guenter went ahead and removed the driver.
So... we're better off without having this system at all than we would be if it were limited to root OR if it were randomized? Sounds like silly kernel politicking to me. "You don't like my plan? Oh well, I guess I'll take the ball and go home, have fun losers!"
Are the energy consumption values reported by Intel CPUs accurate? Measuring energy consumption for cheap is hard, so I wonder whether they are big approximations or they have some magic tricks.
Yes. Much earlier architectures (e.g., Sandy Bridge) used event counters as a rough approximation for energy consumption. However, these days, we use calibrated current sensors, not approximations. These are rather accurate. And accurate enough to do a side-channel attack, too. If software opts-in for security, we also add a little bit of randomness to the readings, in order to avoid measurements being too data-dependent to where crypto would be broken (PLATYPUS attack), but not enough to affect accuracy for normal use cases.
As far as I know RAPL is implemented entirely in the CPU and is an estimate of CPU power using a complex model of CPU state, temperature and such. I don't believe it's an actual power measurement like e.g. SVI telemetry is.
This was true for earlier implementations, but newer ones actually measure power. There is an ADC in there. At least for Intel. Not sure about AMD implementation.
In my opinion, Astron's PMT (Power Measurement Toolkit) is a much more useful tool than this, because it abstracts over Intel, AMD, and Nvidia (including Jetson): https://git.astron.nl/RD/pmt
I really wish there is a similar tool for measuring energy consumption of a transceiver power amplifier (PA) inside any wireless device because the efficiency is abysmal (less than 50% in real life scenario due to impedance matching, skin effect, etc) not unlike the internal combustion engine (ICE) but at least the latter do not need to deal with mismatched and high frequency issues. In fact PA is increasingly becoming the main culprit of energy wasting in any connected devices especially the wireless ones, and about 50% of the power consumption of the entire device system by the PA are normal. With IoT and machine-to-machine (M2M) type of communications where data transmissions are regular and frequent unlike human type of communication where they sleep at night, machines mostly never sleep and this makes the PA inefficiency becomes even more notorious compared to human communications.
Two systems I know from HPC that more usefully expose various architectures' RAPL etc. to userland via a daemon for application profiling are https://variorum.readthedocs.io/ and https://hpc.fau.de/research/tools/likwid/. Of course other sources of power consumption than CPU/uncore and GPU may be significant.
For whole-node power on typical racked systems, I'd expect to interrogate the power strips or similar supplies with SNMP or otherwise.
You don't just want to measure CPU consumption, but whole-system power is only useful for application performance if only one application of significance runs on it. I'd expect to measure it anyway for system management purposes.
Wild, I just came across this while doing some research on power consumption. I got a AMD 5950X and a Nvidia 4080 Super and I was conscerned about using too much power on my 750 Watt power supply. lol.
It's just a coulomb counter you can read from an MSR. But yes monitoring it inevitably consumes some amount of energy. It won't cost anything on a busy system but waking up an idle system to read it will be more noticeable. This is why I no longer use background metrics monitors like atop or netdata. An Intel client CPU can idle below 100mw if you leave it be, but something like netdata will raise that to 5W or worse.
> does not switch to 200MHz for a minute during video calls
I had a Dell work laptop that did the same thing. As far as I was able to tell the system had a bug/fault that continuously asserted the CPU's BD PROCHOT line when the integrated webcam was active. I don't think it was an Intel bug, the CPU was just responding to the external signal that (falsely) indicated the system was overheating.
[+] [-] sirn|1 year ago|reply
For a more detailed reading on several more metrics about the CPU, I think pcm[1] may be a better tool (it's a successor to the Intel Power Gadget the project was forked from). Though, it only works on Intel CPU.
[1]: https://github.com/intel/pcm
[+] [-] 149765|1 year ago|reply
[+] [-] lathiat|1 year ago|reply
Power profiling is listed as supported on all CPUs though a bunch of features (including memory bandwidth, one that I had wanted) are limited to EPYC CPUs and don't exist in Ryzen or Threadripper.
[+] [-] jeffbee|1 year ago|reply
[+] [-] Sweepi|1 year ago|reply
https://www.kernel.org/doc/html/v5.8/hwmon/amd_energy.html
https://www.kernel.org/doc/html/v5.12/hwmon/amd_energy.html
https://www.kernel.org/doc/html/v5.13/hwmon/amd_energy.html (404)
https://www.phoronix.com/news/Linux-5.13-AMD-Energy-Removed
https://www.phoronix.com/news/No-More-AMD-Energy
[+] [-] aftbit|1 year ago|reply
>Long story short, since last year the AMD Energy sensor information has been limited to root due to the PLATYPUS security vulnerability. HWMON maintainer Guenter Roeck proposed slightly limiting and randomizing the sensor data so it couldn't be used for nefarious purposes but still accurate enough for genuine use-cases and no longer needing to be root-only access. However, AMD engineers didn't like that approach.
>With the hardware monitoring subsystem maintainer not wanting the information to be restricted to root-only and AMD not wanting the limiting/randomization approach, Guenter went ahead and removed the driver.
So... we're better off without having this system at all than we would be if it were limited to root OR if it were randomized? Sounds like silly kernel politicking to me. "You don't like my plan? Oh well, I guess I'll take the ball and go home, have fun losers!"
[+] [-] sirn|1 year ago|reply
[+] [-] speedgoose|1 year ago|reply
[+] [-] ngneer|1 year ago|reply
[+] [-] lastgeniusua|1 year ago|reply
https://dl.acm.org/doi/10.1145/2989081.2989088
https://dl.acm.org/doi/10.1145/3177754
[+] [-] unknown|1 year ago|reply
[deleted]
[+] [-] formerly_proven|1 year ago|reply
[+] [-] ngneer|1 year ago|reply
[+] [-] wiz21c|1 year ago|reply
"Identifying Compiler Options to Minimise Energy Consumption for Embedded platforms"
https://arxiv.org/pdf/1303.6485
[+] [-] gpuhacker|1 year ago|reply
[+] [-] gpuhacker|1 year ago|reply
There is also a paper about PMT: https://arxiv.org/pdf/2210.03724
[+] [-] teleforce|1 year ago|reply
[+] [-] gnufx|1 year ago|reply
For whole-node power on typical racked systems, I'd expect to interrogate the power strips or similar supplies with SNMP or otherwise.
[+] [-] iAm25626|1 year ago|reply
[+] [-] petermcneeley|1 year ago|reply
[+] [-] reportgunner|1 year ago|reply
[+] [-] dannyw|1 year ago|reply
[+] [-] gnufx|1 year ago|reply
[+] [-] robertheadley|1 year ago|reply
This was yesterday. Wild.
[+] [-] unknown|1 year ago|reply
[deleted]
[+] [-] Gelob|1 year ago|reply
[+] [-] steve1977|1 year ago|reply
[+] [-] jhrmnn|1 year ago|reply
[+] [-] sandworm101|1 year ago|reply
[+] [-] imvetri|1 year ago|reply
[+] [-] jeffbee|1 year ago|reply
[+] [-] chickenchase-rd|1 year ago|reply
[+] [-] aljgz|1 year ago|reply
[deleted]
[+] [-] silotis|1 year ago|reply
I had a Dell work laptop that did the same thing. As far as I was able to tell the system had a bug/fault that continuously asserted the CPU's BD PROCHOT line when the integrated webcam was active. I don't think it was an Intel bug, the CPU was just responding to the external signal that (falsely) indicated the system was overheating.
[+] [-] Almondsetat|1 year ago|reply
[+] [-] nottorp|1 year ago|reply
[+] [-] navjack27|1 year ago|reply