And if one is using iGPU, one might think I'll have a great bandwidth, but reality is that DDR memory for CPU is optimized for low latency not bandwidth and they'll probably have a 64 bit channel (or 2x32 bits) from a single DDR module or 128 bit in dual channel configuration, while something like RTX 4090 will have onboard graphics-DDR GDDR memory on 384 bit channel very much optimized for bandwidth and not latency pushing according to specs a terabyte per second. Apple really needed their memory architecture - having a high memory bandwidth for onboard GPU to have reasonable performance.
jms55|1 year ago
iGPUs have less latency, but also much less bandwidth. So all those global memory fetches in modern GPU algorithms become much slower, when looking at a birds-eye level of overall throughput across the dispatch. It's why things like SSAO are way more expensive on iGPUs, despite needing to operate at a lower resolution.