Ugh. Few years ago I've built a gaming rig with i5-8400 and GTX1080 (both chosen for known workloads). Some games ran fine, but some were jerky af, and frametime monitor was zigzag-y all over the place. I thought that maybe 8400 was not a best choice despite my research and bought i7-8700 only to see that situation got much worse. After days of googling and discussions I found the issue: mobo bios had C1E state enabled. In short, it allows to drop the CPU frequency and voltage significantly when it's idling, but this technique isn't ready to operate 100+ times per second. After drawing a frame, CPU basically did nothing for a period of time (<10ms), which was enough to drop to C1E, but it can't get out of it quickly for some reason. And of course 8700 was much better at sucking at it, since it had more free time to fall asleep.
I understand that power saving is useful in general, but man, when Direct3D sees every other frame skipped, maybe it's time to turn the damn thing off for a while. Idk how a regular consumer could deal with it. You basically spend a little fortune on a rig, which then stutters worse than an average IGP because of some stupid misconfiguration.
> but it can't get out of it quickly for some reason
As overclockers are aware, to achieve higher frequencies but keep CPU stable, one gonna need higher CPU voltage. Works other way too, lowing frequency allows to lower voltage, and that’s what mostly delivers the power saving from these low-power states.
These chips can’t adjust voltage instantly because wires inside them are rather thin, and there’s non-trivial capacity everywhere. This means CPUs can drop frequency instantly, then decrease the voltage over time. However, if they raise frequency instantly without first raising the voltage, the chip will glitch.
That’s AFAIK the main reason why increasing the clock frequency takes time. The chips first raise the voltage which takes time because capacity, and only then they can raise the frequency which is instant.
As someone who contributed to a (formerly) OpenGL-based video player[1], these issues with waiting for vblank and frame time variability on Windows are depressingly familiar. Dropping even one frame is unacceptable in a video player, but we seemed to drop them unavoidably. We fought a losing battle with frame timings in OpenGL for years, which eventually ended by just porting the renderer to Vulkan and Direct3D 11.
One thing that we noticed was that wakeups after wglSwapBuffers were just more jittery than wakeups after D3D9/D3D11 Present() with the same software on the same system. In windowed mode, this could be mitigated by blocking on DwmFlush() instead of wglSwapBuffers (it seems like GLFW does this too, but only in Vista and 7.)
The developer might also get some mileage from using ANGLE (a GLES 3.1 implementation on top of D3D11) or Microsoft's new GLon12.
I used to work at a high-performance scientific computing company. In the mid-2000s they ran into a weird issue where performance would crater on customer PCs running windows, unless that PC were currently running Windows Media Player. Something to do with process scheduling priority. Don’t know whether this was a widely-disseminated old hand trick of the era or anything.
It is astonishing to me that someone would want to use Windows for something HPC related. I'm not generally a Windows hater (actually I am, but I see that there are legitimate business reasons to use it), but the HPC ecosystem seems much more Linux-friendly.
> by linking with PowrProf.dll, and then calling this function from powersetting.h as follows
> This function is part of User32.lib, and is defined in winuser.h which is included in Windows.h.
This is one reason I think Windows is such a mess of an OS. (Look at the contents of C:\Windows and tell me it's not, if you can do so with a straight face!)
To make what ought to be a system call you have to load some DLL, sys, or lib file at a random (but fixed) path and call a function on it.
That combined with COM, and the registry, and I don't want to touch it with a ten-foot pole.
This isn't especially different from Linux's dependency on /lib/ld.so. There's a design choice to not have syscalls and instead make you go through the libraries, to discourage people making themselves dependent on undocumented syscalls. Of course, there probably shouldn't be undocumented syscalls in the first place, since that's a bit suspicious.
> combined with COM, and the registry
And yet GNOME has dconf and CORBA, because in order to do certain things you converge on the same solutions.
(Now, if you want a mess, the attempts to retrofit secure containers onto this with UWP definitely count!)
> To make what ought to be a system call you have to load some DLL, sys, or lib file at a random (but fixed) path and call a function on it.
“Ought to be a system call” is a matter of opinion. Among OSes, Linux is an outlier in that it keeps its system call interface stable.
Many other OSes choose to provide a library with a stable interface through which system calls can (and, in some cases must. See https://lwn.net/Articles/806776/; discussed in https://news.ycombinator.com/item?id=21859612) be called. That allows them to change the system call ABI, for example to retire calls that have been superseded by other ones.
(ideally, IMO, that library should not be the C library. There should be two libraries, a “Kernel interface library” and a “C library”. That’s a different subject, though)
You can also see performance improvements in processes that do I/O by having a low priority process running that does nothing but run an infinite loop. This keeps the computer from switching to idle CPU states during the I/O. This was on Linux, there is probably an OS setting to accomplish the same thing, but it was pretty counter-intuitive.
I've had games do this and found it annoying since I like my PC to run in balanced mode. Not so much to save power but to let the machine idle when I'm not using it. Found I could work around it by deleting the power plans other than balanced.
I've never played OP's game, so evidently a few games are out there doing this.
Interestingly, Garage Band on my G5 kicks power management to highest performance without asking, though it turns it back down when it quits. Guess Apple didn't have a problem with it.
> As of April 5th 2017 with the release of Windows 10 Version 1703, SetProcessDpiAwarenessContext used above is the replacement for SetProcessDpiAwareness, which in turn was a replacement for SetProcessDPIAware. Love the clear naming scheme.
This is the kind of thing I hate about "New Windows". Once upon a time MS used to strive for backward compatibility. These days every few years there's a new function you need to call. You can't get optimal behavior just by writing good code from the start. You need to do that, and also call the YesIKnowHowPixelsWork api call, and set <yesIAmCompetent>true</yesIAmCompetent> in your manifest to get what should be the default behavior. It's a mess.
This is precisely the "Old Windows" way of doing things where there are legacy APIs still supported for that forever backwards compatibility and current APIs exist for ways you probably want to do things in a new app.
For reference SetProcessDPIAware solidified over 15 years ago whereas 15 years prior to that there wasn't even a taskbar so of course it's going to be out of date from a UI API perspective but that's what's needed if you want to also support apps from 15 years ago well.
The reason it's so complex is because of backwards compatibility. Non-DPI aware applications from before DPI settings were a thing can't advertise that they're not DPI aware, so if an application doesn't announce which it is, Windows has to assume that it's not aware. A couple years ago, Microsoft was able to make changes to the GDI libraries to automatically adjust the size of elements its rendering which makes a lot of things sharper. But things like images or anything on screen not rendered by GDI will not magically become sharp.
About switchable graphics, nVidia APIs do work. The problem with them, there's no API to switch to the faster GPU, they only have APIs to setup a profile for an application, ask for the faster GPU in that profile, and the changes will be applied next time the app launches.
I had to do that couple times for Direct3D 11 or 12 apps with frontend written in WPF. Microsoft doesn't support exporting DWORD variables from .NET executables.
It's possible I'm misunderstanding the docs, but here's the line that lead me to believe linking to one of their libraries alone would be enough (and lead to my surprise when it didn't work):
> For any application without an existing application profile, there is a set of libraries which, when statically linked to a given application executable, will direct the Optimus driver to render the application using High Performance Graphics. As of Release 302, the current list of libraries are vcamp110.dll, vcamp110d.dll, nvapi.dll, nvapi64.dll, opencl.dll, nvcuda.dll, and cudart..
> This isn’t often relevant for games, but, if you need to check how much things would have been scaled if you weren’t DPI aware, you can call GetDpiForWindow and divide the result by 96.
If you aren't scaling up text and UI elements based on the DPI then it doesn't really sound like your application is truly DPI aware to me. I don't see why that applies any differently to games versus any other kind of application.
I think it's reasonably common for games to scale their text and UI elements by the overall screen or window size, in which case opting out of clever OS DPI tricks is the right choice. Using actual DPI doesn't make much sense in general - the player could be sitting right in front of their laptop screen or feet away from a big TV, which obviously require very diffent font sizes in real world units.
Unless the game engine is doing its own scaling, this does sound like lying to the operating system to get out of the way of those pesky user-friendly features to get more frames.
I think Microsoft made it this hard to enable the DPI-aware setting exactly because it forces developers to think about things like DPI. If everyone follows this guide and ignores it, then I predict that in a few years this setting will be ignored as well and a new DPI-awareness API will be released.
If you’re DPI aware you should never size your elements by physical pixels, but many game UI elements like a HUD or very simple menus scale better using “percentage of screen dimension” or similar heuristics. Like using fixed device independent pixel sizes this system won’t cause the elements to look tiny on a HiDPI display, and it will generally do a better job on large screens that are usually far away from the user (e.g. TVs).
Games should either be aware of the user’s preferred scaling or at least offer their own UI scaling option. But they should always register as DPI aware so they don’t render the 3D scene at a lower resolution than what’s selected
1. Please add options to conserve battery too. A FPS limiter would be good. Messing with the system power management when the user doesn't want to be tethered to a wall plug is Not Nice(tm).
2. When you do UI scaling, especially if you're young with 20/20 eyesight, please allow scaling beyond what you think is big enough.
The other reply mentioned that vsync can save battery--on top of vsync, Way of Rhea supports syncing to every other vblank, halving the FPS. This will presumably should save even more battery (though the intended use case is to prevent stuttering on computers that can't consistently hit the monitor's refresh rate.) Ultimately, though, no matter what I do I don't think you're gonna be able to play very long on battery power.
For #2--my glasses prescription is so strong that I spent $330 on fancy lenses today in the hopes that they'll distort less around the edges. Wish me luck. (:
1. FPS limit needs to be an integer multiple of the display refresh rate, or you introduce stuttering. That’s why missing a vsync results in exactly 30fps on a 60hz monitor, and that isn’t a good user experience.
Using Vsync is actually conservative power wise. You can always smash the gpu by rendering frames as fast as possible. This a one way of reducing input latency in FPS shooters, using vsync guarantees one frame of latency on a fast GPU.
2. Agreed 100%. All too frequently the UI is too small on high dpi monitors (see CIV6), or unreadable when viewed from a distance when playing on a TV (see Witcher 3)
I was surprised to find vertical blanking interval mentioned in the article as CRTs haven't been a common sight for years. Is it still a relevant concept when writing code for modern GPUs?
To the video hardware, LCDs are raster displays, and receive information in line scans and frames over time just like CRTs, even if it's a digital signal. You can have V-sync on or off (complete with screen tearing) just the same.
I don't have a full answer, but if it helps at all, these are all C APIs--so if you can find a way to call C code from your Java program you should be set.
[+] [-] wruza|4 years ago|reply
Ugh. Few years ago I've built a gaming rig with i5-8400 and GTX1080 (both chosen for known workloads). Some games ran fine, but some were jerky af, and frametime monitor was zigzag-y all over the place. I thought that maybe 8400 was not a best choice despite my research and bought i7-8700 only to see that situation got much worse. After days of googling and discussions I found the issue: mobo bios had C1E state enabled. In short, it allows to drop the CPU frequency and voltage significantly when it's idling, but this technique isn't ready to operate 100+ times per second. After drawing a frame, CPU basically did nothing for a period of time (<10ms), which was enough to drop to C1E, but it can't get out of it quickly for some reason. And of course 8700 was much better at sucking at it, since it had more free time to fall asleep.
I understand that power saving is useful in general, but man, when Direct3D sees every other frame skipped, maybe it's time to turn the damn thing off for a while. Idk how a regular consumer could deal with it. You basically spend a little fortune on a rig, which then stutters worse than an average IGP because of some stupid misconfiguration.
[+] [-] Const-me|4 years ago|reply
As overclockers are aware, to achieve higher frequencies but keep CPU stable, one gonna need higher CPU voltage. Works other way too, lowing frequency allows to lower voltage, and that’s what mostly delivers the power saving from these low-power states.
These chips can’t adjust voltage instantly because wires inside them are rather thin, and there’s non-trivial capacity everywhere. This means CPUs can drop frequency instantly, then decrease the voltage over time. However, if they raise frequency instantly without first raising the voltage, the chip will glitch.
That’s AFAIK the main reason why increasing the clock frequency takes time. The chips first raise the voltage which takes time because capacity, and only then they can raise the frequency which is instant.
[+] [-] Bancakes|4 years ago|reply
I use QuickCPU and max everything out. Yes it sounds like a sham but works wonders.
https://coderbag.com/product/quickcpu
[+] [-] stedolph|4 years ago|reply
[+] [-] rossy|4 years ago|reply
One thing that we noticed was that wakeups after wglSwapBuffers were just more jittery than wakeups after D3D9/D3D11 Present() with the same software on the same system. In windowed mode, this could be mitigated by blocking on DwmFlush() instead of wglSwapBuffers (it seems like GLFW does this too, but only in Vista and 7.)
The developer might also get some mileage from using ANGLE (a GLES 3.1 implementation on top of D3D11) or Microsoft's new GLon12.
[1]: https://mpv.io/
[+] [-] ahelwer|4 years ago|reply
[+] [-] Const-me|4 years ago|reply
[+] [-] bee_rider|4 years ago|reply
[+] [-] shmerl|4 years ago|reply
[+] [-] unknown|4 years ago|reply
[deleted]
[+] [-] TimTheTinker|4 years ago|reply
> This function is part of User32.lib, and is defined in winuser.h which is included in Windows.h.
This is one reason I think Windows is such a mess of an OS. (Look at the contents of C:\Windows and tell me it's not, if you can do so with a straight face!)
To make what ought to be a system call you have to load some DLL, sys, or lib file at a random (but fixed) path and call a function on it.
That combined with COM, and the registry, and I don't want to touch it with a ten-foot pole.
[+] [-] pjc50|4 years ago|reply
> combined with COM, and the registry
And yet GNOME has dconf and CORBA, because in order to do certain things you converge on the same solutions.
(Now, if you want a mess, the attempts to retrofit secure containers onto this with UWP definitely count!)
[+] [-] Someone|4 years ago|reply
“Ought to be a system call” is a matter of opinion. Among OSes, Linux is an outlier in that it keeps its system call interface stable.
Many other OSes choose to provide a library with a stable interface through which system calls can (and, in some cases must. See https://lwn.net/Articles/806776/; discussed in https://news.ycombinator.com/item?id=21859612) be called. That allows them to change the system call ABI, for example to retire calls that have been superseded by other ones.
(ideally, IMO, that library should not be the C library. There should be two libraries, a “Kernel interface library” and a “C library”. That’s a different subject, though)
[+] [-] SamReidHughes|4 years ago|reply
[+] [-] theevilsharpie|4 years ago|reply
On x86 processors, you can achieve this at the kernel level by adding `idle=poll` to the kernel command line.
[+] [-] sdflhasjd|4 years ago|reply
[+] [-] discreditable|4 years ago|reply
I've never played OP's game, so evidently a few games are out there doing this.
[+] [-] usbqk|4 years ago|reply
https://devblogs.microsoft.com/oldnewthing/20081211-00/?p=19...
[+] [-] masonremaley|4 years ago|reply
[+] [-] classichasclass|4 years ago|reply
[+] [-] ygra|4 years ago|reply
[+] [-] connordoner|4 years ago|reply
[+] [-] howdydoo|4 years ago|reply
This is the kind of thing I hate about "New Windows". Once upon a time MS used to strive for backward compatibility. These days every few years there's a new function you need to call. You can't get optimal behavior just by writing good code from the start. You need to do that, and also call the YesIKnowHowPixelsWork api call, and set <yesIAmCompetent>true</yesIAmCompetent> in your manifest to get what should be the default behavior. It's a mess.
[+] [-] zamadatix|4 years ago|reply
For reference SetProcessDPIAware solidified over 15 years ago whereas 15 years prior to that there wasn't even a taskbar so of course it's going to be out of date from a UI API perspective but that's what's needed if you want to also support apps from 15 years ago well.
[+] [-] ziml77|4 years ago|reply
[+] [-] bobbyi|4 years ago|reply
[+] [-] masonremaley|4 years ago|reply
[+] [-] Const-me|4 years ago|reply
I had to do that couple times for Direct3D 11 or 12 apps with frontend written in WPF. Microsoft doesn't support exporting DWORD variables from .NET executables.
Technical info there: https://stackoverflow.com/a/40915100
[+] [-] masonremaley|4 years ago|reply
(https://docs.nvidia.com/gameworks/content/technologies/deskt...)
> For any application without an existing application profile, there is a set of libraries which, when statically linked to a given application executable, will direct the Optimus driver to render the application using High Performance Graphics. As of Release 302, the current list of libraries are vcamp110.dll, vcamp110d.dll, nvapi.dll, nvapi64.dll, opencl.dll, nvcuda.dll, and cudart..
[+] [-] shawnz|4 years ago|reply
If you aren't scaling up text and UI elements based on the DPI then it doesn't really sound like your application is truly DPI aware to me. I don't see why that applies any differently to games versus any other kind of application.
[+] [-] makomk|4 years ago|reply
[+] [-] jeroenhd|4 years ago|reply
I think Microsoft made it this hard to enable the DPI-aware setting exactly because it forces developers to think about things like DPI. If everyone follows this guide and ignores it, then I predict that in a few years this setting will be ignored as well and a new DPI-awareness API will be released.
[+] [-] johncolanduoni|4 years ago|reply
[+] [-] ziml77|4 years ago|reply
[+] [-] unknown|4 years ago|reply
[deleted]
[+] [-] nottorp|4 years ago|reply
1. Please add options to conserve battery too. A FPS limiter would be good. Messing with the system power management when the user doesn't want to be tethered to a wall plug is Not Nice(tm).
2. When you do UI scaling, especially if you're young with 20/20 eyesight, please allow scaling beyond what you think is big enough.
[+] [-] masonremaley|4 years ago|reply
For #2--my glasses prescription is so strong that I spent $330 on fancy lenses today in the hopes that they'll distort less around the edges. Wish me luck. (:
[+] [-] fps-hero|4 years ago|reply
Using Vsync is actually conservative power wise. You can always smash the gpu by rendering frames as fast as possible. This a one way of reducing input latency in FPS shooters, using vsync guarantees one frame of latency on a fast GPU.
2. Agreed 100%. All too frequently the UI is too small on high dpi monitors (see CIV6), or unreadable when viewed from a distance when playing on a TV (see Witcher 3)
[+] [-] h3mb3|4 years ago|reply
I was surprised to find vertical blanking interval mentioned in the article as CRTs haven't been a common sight for years. Is it still a relevant concept when writing code for modern GPUs?
[+] [-] pjc50|4 years ago|reply
In any case, you want to know when the frame has been scanned out so you can swap the buffer to the next frame.
[+] [-] tuyiown|4 years ago|reply
[+] [-] theandrewbailey|4 years ago|reply
[+] [-] xmodem|4 years ago|reply
[+] [-] splittingTimes|4 years ago|reply
[+] [-] masonremaley|4 years ago|reply
[+] [-] andriosusanto|4 years ago|reply
[deleted]