top | item 42050648

(no title)

trq01758 | 1 year ago

And if one is using iGPU, one might think I'll have a great bandwidth, but reality is that DDR memory for CPU is optimized for low latency not bandwidth and they'll probably have a 64 bit channel (or 2x32 bits) from a single DDR module or 128 bit in dual channel configuration, while something like RTX 4090 will have onboard graphics-DDR GDDR memory on 384 bit channel very much optimized for bandwidth and not latency pushing according to specs a terabyte per second. Apple really needed their memory architecture - having a high memory bandwidth for onboard GPU to have reasonable performance.

discuss

jms55|1 year ago

Yep, this is another great callout. Desktop GPUs are (in my experience) often heavily memory limited, and that's with their big high bandwidth memory chips. The latency is a problem, but latency hiding means overall throughput is good and it works out in practice.

iGPUs have less latency, but also much less bandwidth. So all those global memory fetches in modern GPU algorithms become much slower, when looking at a birds-eye level of overall throughput across the dispatch. It's why things like SSAO are way more expensive on iGPUs, despite needing to operate at a lower resolution.