top | item 26714809

(no title)

gbl08ma | 4 years ago

All of that is bandwidth and clock speed, not latency

discuss

Look, if CPUs were better at memory latency, the BVH-traversal of raytracing would still be done on CPUs.

BVH-tree traversals are done on the GPU now for a reason. GPUs are better at latency hiding and taking advantage of larger sets of bandwidth than CPUs. Yes, even on things like pointer-chasing through a BVH-tree for AABB bounds checking.

GPUs have pushed latency down and latency-hiding up to unimaginable figures. In terms of absolute latency, you're right, GPUs are still higher latency than CPUs. But in terms of "practical" effects (once accounting for latency hiding tricks on the GPU, such as 8x way occupancy (similar to hyperthreading), as well as some dedicated datastructures / programming tricks (largely taking advantage of the millions of rays processed in parallel per frame), it turns out that you can convert many latency-bound problems into bandwidth-constrained problems.

-----------

That's the funny thing about computer science. It turns out that with enough RAM and enough parallelism, you can convert ANY latency-bound problem into a bandwidth-bound problem. You just need enough cache to hold the results in the meantime, while you process other stuff in parallel.

Raytracing is an excellent example of this form of latency hiding. Bouncing a ray off of your global data-structure of objects involved traversing pointers down the BVH tree. A ton of linked-list like current_node = current_node->next like operations (depending on which current_node->child the ray hit).

From the perspective of any ray, it looks like its latency-bound. But from the perspective of processing 2.073 million rays across a 1920 x 1080 video game scene with realtime-raytracing enabled, its bandwidth bound.