I don't understand why the programs are the same. The partial render store program has to write out both the color and the depth buffer, while the final render store should only write out color and throw away depth.
E.g in their example in the link above for deferred rendering (figure 4) the multiple G buffers won't actually need to leave the on-chip tile buffer - unless there's a partial render before the final shading shader is run.
Right, I had the article's bunny test program on my mind, which looks like it has only one pass.
In OpenGL, the driver would have to scan the following commands to see if it can discard the depth data. If it doesn't see the depth buffer get cleared, it has to be conservative and save the data. I assume mobile GPU drivers in general do make the effort to do this optimization, as the bandwidth savings are significant.
In Vulkan, the application explicitly specifies which attachment (i.e. stencil, depth, color buffer) must be persisted at the end of a render pass, and which need not. So that maps nicely to the "final render flush program".
The quote is about Metal, though, which I'm not familiar with, but a sibling comment points out it's similar to Vulkan in this aspect.
So that leaves me wondering: did Rosenzweig happen to only try Metal apps that always use MTLStoreAction.store in passes that overflow the TVB, or is the Metal driver skipping a useful optimization, or neither? E.g. because the hardware has another control for this?
kimixa|3 years ago
https://developer.apple.com/documentation/metal/resource_fun...
E.g in their example in the link above for deferred rendering (figure 4) the multiple G buffers won't actually need to leave the on-chip tile buffer - unless there's a partial render before the final shading shader is run.
plekter|3 years ago
For partial rendering all samples must be written out, but for the final one you can resolve(average) them before writeout.
hansihe|3 years ago
pocak|3 years ago
In OpenGL, the driver would have to scan the following commands to see if it can discard the depth data. If it doesn't see the depth buffer get cleared, it has to be conservative and save the data. I assume mobile GPU drivers in general do make the effort to do this optimization, as the bandwidth savings are significant.
In Vulkan, the application explicitly specifies which attachment (i.e. stencil, depth, color buffer) must be persisted at the end of a render pass, and which need not. So that maps nicely to the "final render flush program".
The quote is about Metal, though, which I'm not familiar with, but a sibling comment points out it's similar to Vulkan in this aspect.
So that leaves me wondering: did Rosenzweig happen to only try Metal apps that always use MTLStoreAction.store in passes that overflow the TVB, or is the Metal driver skipping a useful optimization, or neither? E.g. because the hardware has another control for this?
johntb86|3 years ago
Someone|3 years ago