top | item 29958431

Making Your Game Go Fast by Asking Windows Nicely

154 points| zdw | 4 years ago |anthropicstudios.com | reply

71 comments

order
[+] wruza|4 years ago|reply
missing vblank due to power management

Ugh. Few years ago I've built a gaming rig with i5-8400 and GTX1080 (both chosen for known workloads). Some games ran fine, but some were jerky af, and frametime monitor was zigzag-y all over the place. I thought that maybe 8400 was not a best choice despite my research and bought i7-8700 only to see that situation got much worse. After days of googling and discussions I found the issue: mobo bios had C1E state enabled. In short, it allows to drop the CPU frequency and voltage significantly when it's idling, but this technique isn't ready to operate 100+ times per second. After drawing a frame, CPU basically did nothing for a period of time (<10ms), which was enough to drop to C1E, but it can't get out of it quickly for some reason. And of course 8700 was much better at sucking at it, since it had more free time to fall asleep.

I understand that power saving is useful in general, but man, when Direct3D sees every other frame skipped, maybe it's time to turn the damn thing off for a while. Idk how a regular consumer could deal with it. You basically spend a little fortune on a rig, which then stutters worse than an average IGP because of some stupid misconfiguration.

[+] Const-me|4 years ago|reply
> but it can't get out of it quickly for some reason

As overclockers are aware, to achieve higher frequencies but keep CPU stable, one gonna need higher CPU voltage. Works other way too, lowing frequency allows to lower voltage, and that’s what mostly delivers the power saving from these low-power states.

These chips can’t adjust voltage instantly because wires inside them are rather thin, and there’s non-trivial capacity everywhere. This means CPUs can drop frequency instantly, then decrease the voltage over time. However, if they raise frequency instantly without first raising the voltage, the chip will glitch.

That’s AFAIK the main reason why increasing the clock frequency takes time. The chips first raise the voltage which takes time because capacity, and only then they can raise the frequency which is instant.

[+] stedolph|4 years ago|reply
I strongly agree with you, and it's just that I have i7, but all of the experiences are quite similar.
[+] rossy|4 years ago|reply
As someone who contributed to a (formerly) OpenGL-based video player[1], these issues with waiting for vblank and frame time variability on Windows are depressingly familiar. Dropping even one frame is unacceptable in a video player, but we seemed to drop them unavoidably. We fought a losing battle with frame timings in OpenGL for years, which eventually ended by just porting the renderer to Vulkan and Direct3D 11.

One thing that we noticed was that wakeups after wglSwapBuffers were just more jittery than wakeups after D3D9/D3D11 Present() with the same software on the same system. In windowed mode, this could be mitigated by blocking on DwmFlush() instead of wglSwapBuffers (it seems like GLFW does this too, but only in Vista and 7.)

The developer might also get some mileage from using ANGLE (a GLES 3.1 implementation on top of D3D11) or Microsoft's new GLon12.

[1]: https://mpv.io/

[+] ahelwer|4 years ago|reply
I used to work at a high-performance scientific computing company. In the mid-2000s they ran into a weird issue where performance would crater on customer PCs running windows, unless that PC were currently running Windows Media Player. Something to do with process scheduling priority. Don’t know whether this was a widely-disseminated old hand trick of the era or anything.
[+] bee_rider|4 years ago|reply
It is astonishing to me that someone would want to use Windows for something HPC related. I'm not generally a Windows hater (actually I am, but I see that there are legitimate business reasons to use it), but the HPC ecosystem seems much more Linux-friendly.
[+] TimTheTinker|4 years ago|reply
> by linking with PowrProf.dll, and then calling this function from powersetting.h as follows

> This function is part of User32.lib, and is defined in winuser.h which is included in Windows.h.

This is one reason I think Windows is such a mess of an OS. (Look at the contents of C:\Windows and tell me it's not, if you can do so with a straight face!)

To make what ought to be a system call you have to load some DLL, sys, or lib file at a random (but fixed) path and call a function on it.

That combined with COM, and the registry, and I don't want to touch it with a ten-foot pole.

[+] pjc50|4 years ago|reply
This isn't especially different from Linux's dependency on /lib/ld.so. There's a design choice to not have syscalls and instead make you go through the libraries, to discourage people making themselves dependent on undocumented syscalls. Of course, there probably shouldn't be undocumented syscalls in the first place, since that's a bit suspicious.

> combined with COM, and the registry

And yet GNOME has dconf and CORBA, because in order to do certain things you converge on the same solutions.

(Now, if you want a mess, the attempts to retrofit secure containers onto this with UWP definitely count!)

[+] Someone|4 years ago|reply
> To make what ought to be a system call you have to load some DLL, sys, or lib file at a random (but fixed) path and call a function on it.

“Ought to be a system call” is a matter of opinion. Among OSes, Linux is an outlier in that it keeps its system call interface stable.

Many other OSes choose to provide a library with a stable interface through which system calls can (and, in some cases must. See https://lwn.net/Articles/806776/; discussed in https://news.ycombinator.com/item?id=21859612) be called. That allows them to change the system call ABI, for example to retire calls that have been superseded by other ones.

(ideally, IMO, that library should not be the C library. There should be two libraries, a “Kernel interface library” and a “C library”. That’s a different subject, though)

[+] SamReidHughes|4 years ago|reply
You can also see performance improvements in processes that do I/O by having a low priority process running that does nothing but run an infinite loop. This keeps the computer from switching to idle CPU states during the I/O. This was on Linux, there is probably an OS setting to accomplish the same thing, but it was pretty counter-intuitive.
[+] theevilsharpie|4 years ago|reply
> This was on Linux, there is probably an OS setting to accomplish the same thing, but it was pretty counter-intuitive.

On x86 processors, you can achieve this at the kernel level by adding `idle=poll` to the kernel command line.

[+] sdflhasjd|4 years ago|reply
PowerSetActiveScheme sets the system power plan, it's not something a game should be doing without telling the user first.
[+] discreditable|4 years ago|reply
I've had games do this and found it annoying since I like my PC to run in balanced mode. Not so much to save power but to let the machine idle when I'm not using it. Found I could work around it by deleting the power plans other than balanced.

I've never played OP's game, so evidently a few games are out there doing this.

[+] masonremaley|4 years ago|reply
That's a good point--I'll look into whether Microsoft has any guidelines on this, and add a disclaimer to the article when I get a chance.
[+] classichasclass|4 years ago|reply
Interestingly, Garage Band on my G5 kicks power management to highest performance without asking, though it turns it back down when it quits. Guess Apple didn't have a problem with it.
[+] ygra|4 years ago|reply
It probably also won't reset the setting if the game crashes.
[+] howdydoo|4 years ago|reply
> As of April 5th 2017 with the release of Windows 10 Version 1703, SetProcessDpiAwarenessContext used above is the replacement for SetProcessDpiAwareness, which in turn was a replacement for SetProcessDPIAware. Love the clear naming scheme.

This is the kind of thing I hate about "New Windows". Once upon a time MS used to strive for backward compatibility. These days every few years there's a new function you need to call. You can't get optimal behavior just by writing good code from the start. You need to do that, and also call the YesIKnowHowPixelsWork api call, and set <yesIAmCompetent>true</yesIAmCompetent> in your manifest to get what should be the default behavior. It's a mess.

[+] zamadatix|4 years ago|reply
This is precisely the "Old Windows" way of doing things where there are legacy APIs still supported for that forever backwards compatibility and current APIs exist for ways you probably want to do things in a new app.

For reference SetProcessDPIAware solidified over 15 years ago whereas 15 years prior to that there wasn't even a taskbar so of course it's going to be out of date from a UI API perspective but that's what's needed if you want to also support apps from 15 years ago well.

[+] ziml77|4 years ago|reply
The reason it's so complex is because of backwards compatibility. Non-DPI aware applications from before DPI settings were a thing can't advertise that they're not DPI aware, so if an application doesn't announce which it is, Windows has to assume that it's not aware. A couple years ago, Microsoft was able to make changes to the GDI libraries to automatically adjust the size of elements its rendering which makes a lot of things sharper. But things like images or anything on screen not rendered by GDI will not magically become sharp.
[+] bobbyi|4 years ago|reply

     ASSERT(SetProcessDpiAwarenessContext(DPI_AWARENESS_CONTEXT_PER_MONITOR_AWARE_V2));
If ASSERT is a no-op in release mode then you're only getting your setting set here while in debug mode
[+] masonremaley|4 years ago|reply
It's not, in my codebase, but I'll edit that when I have the chance so nobody blindly copy pastes it and ends up with something super broken
[+] Const-me|4 years ago|reply
About switchable graphics, nVidia APIs do work. The problem with them, there's no API to switch to the faster GPU, they only have APIs to setup a profile for an application, ask for the faster GPU in that profile, and the changes will be applied next time the app launches.

I had to do that couple times for Direct3D 11 or 12 apps with frontend written in WPF. Microsoft doesn't support exporting DWORD variables from .NET executables.

Technical info there: https://stackoverflow.com/a/40915100

[+] masonremaley|4 years ago|reply
It's possible I'm misunderstanding the docs, but here's the line that lead me to believe linking to one of their libraries alone would be enough (and lead to my surprise when it didn't work):

(https://docs.nvidia.com/gameworks/content/technologies/deskt...)

> For any application without an existing application profile, there is a set of libraries which, when statically linked to a given application executable, will direct the Optimus driver to render the application using High Performance Graphics. As of Release 302, the current list of libraries are vcamp110.dll, vcamp110d.dll, nvapi.dll, nvapi64.dll, opencl.dll, nvcuda.dll, and cudart..

[+] shawnz|4 years ago|reply
> This isn’t often relevant for games, but, if you need to check how much things would have been scaled if you weren’t DPI aware, you can call GetDpiForWindow and divide the result by 96.

If you aren't scaling up text and UI elements based on the DPI then it doesn't really sound like your application is truly DPI aware to me. I don't see why that applies any differently to games versus any other kind of application.

[+] makomk|4 years ago|reply
I think it's reasonably common for games to scale their text and UI elements by the overall screen or window size, in which case opting out of clever OS DPI tricks is the right choice. Using actual DPI doesn't make much sense in general - the player could be sitting right in front of their laptop screen or feet away from a big TV, which obviously require very diffent font sizes in real world units.
[+] jeroenhd|4 years ago|reply
Unless the game engine is doing its own scaling, this does sound like lying to the operating system to get out of the way of those pesky user-friendly features to get more frames.

I think Microsoft made it this hard to enable the DPI-aware setting exactly because it forces developers to think about things like DPI. If everyone follows this guide and ignores it, then I predict that in a few years this setting will be ignored as well and a new DPI-awareness API will be released.

[+] johncolanduoni|4 years ago|reply
If you’re DPI aware you should never size your elements by physical pixels, but many game UI elements like a HUD or very simple menus scale better using “percentage of screen dimension” or similar heuristics. Like using fixed device independent pixel sizes this system won’t cause the elements to look tiny on a HiDPI display, and it will generally do a better job on large screens that are usually far away from the user (e.g. TVs).
[+] ziml77|4 years ago|reply
Games should either be aware of the user’s preferred scaling or at least offer their own UI scaling option. But they should always register as DPI aware so they don’t render the 3D scene at a lower resolution than what’s selected
[+] nottorp|4 years ago|reply
Two comments that kinda go against the flow:

1. Please add options to conserve battery too. A FPS limiter would be good. Messing with the system power management when the user doesn't want to be tethered to a wall plug is Not Nice(tm).

2. When you do UI scaling, especially if you're young with 20/20 eyesight, please allow scaling beyond what you think is big enough.

[+] masonremaley|4 years ago|reply
The other reply mentioned that vsync can save battery--on top of vsync, Way of Rhea supports syncing to every other vblank, halving the FPS. This will presumably should save even more battery (though the intended use case is to prevent stuttering on computers that can't consistently hit the monitor's refresh rate.) Ultimately, though, no matter what I do I don't think you're gonna be able to play very long on battery power.

For #2--my glasses prescription is so strong that I spent $330 on fancy lenses today in the hopes that they'll distort less around the edges. Wish me luck. (:

[+] fps-hero|4 years ago|reply
1. FPS limit needs to be an integer multiple of the display refresh rate, or you introduce stuttering. That’s why missing a vsync results in exactly 30fps on a 60hz monitor, and that isn’t a good user experience.

Using Vsync is actually conservative power wise. You can always smash the gpu by rendering frames as fast as possible. This a one way of reducing input latency in FPS shooters, using vsync guarantees one frame of latency on a fast GPU.

2. Agreed 100%. All too frequently the UI is too small on high dpi monitors (see CIV6), or unreadable when viewed from a distance when playing on a TV (see Witcher 3)

[+] h3mb3|4 years ago|reply
Pardon my ignorance (I'm not a game developer).

I was surprised to find vertical blanking interval mentioned in the article as CRTs haven't been a common sight for years. Is it still a relevant concept when writing code for modern GPUs?

[+] tuyiown|4 years ago|reply
I suspect it's just the name that stuck for the double/triple buffer for signal synch'ing and avoid tearing.
[+] theandrewbailey|4 years ago|reply
To the video hardware, LCDs are raster displays, and receive information in line scans and frames over time just like CRTs, even if it's a digital signal. You can have V-sync on or off (complete with screen tearing) just the same.
[+] xmodem|4 years ago|reply
It's not just laptops that have switchable graphics - I have a desktop with a GPU but I use graphics output on the motherboard.
[+] splittingTimes|4 years ago|reply
Is it possible to employ any of those API calls in Java? How would the equivalents look here?
[+] masonremaley|4 years ago|reply
I don't have a full answer, but if it helps at all, these are all C APIs--so if you can find a way to call C code from your Java program you should be set.