top | item 31369749

(no title)

pocak | 3 years ago

I don't understand why the programs are the same. The partial render store program has to write out both the color and the depth buffer, while the final render store should only write out color and throw away depth.

discuss

order

kimixa|3 years ago

Possibly pixel local storage - I think this can be accessed with extended raster order groups and image blocks in metal.

https://developer.apple.com/documentation/metal/resource_fun...

E.g in their example in the link above for deferred rendering (figure 4) the multiple G buffers won't actually need to leave the on-chip tile buffer - unless there's a partial render before the final shading shader is run.

plekter|3 years ago

I think multisampling may be the answer.

For partial rendering all samples must be written out, but for the final one you can resolve(average) them before writeout.

hansihe|3 years ago

Not necessarily, other render passes could need the depth data later.

pocak|3 years ago

Right, I had the article's bunny test program on my mind, which looks like it has only one pass.

In OpenGL, the driver would have to scan the following commands to see if it can discard the depth data. If it doesn't see the depth buffer get cleared, it has to be conservative and save the data. I assume mobile GPU drivers in general do make the effort to do this optimization, as the bandwidth savings are significant.

In Vulkan, the application explicitly specifies which attachment (i.e. stencil, depth, color buffer) must be persisted at the end of a render pass, and which need not. So that maps nicely to the "final render flush program".

The quote is about Metal, though, which I'm not familiar with, but a sibling comment points out it's similar to Vulkan in this aspect.

So that leaves me wondering: did Rosenzweig happen to only try Metal apps that always use MTLStoreAction.store in passes that overflow the TVB, or is the Metal driver skipping a useful optimization, or neither? E.g. because the hardware has another control for this?

Someone|3 years ago

So it seems it allows for optimization. If you know you don’t need everything, one of the steps can do less than the other.