top | item 40880943

(no title)

rys | 1 year ago

In practice, what gets labelled as the L1 cache in a GPU marketing diagram or 3rd party analysis might well not be that first level of a strict cache hierarchy. That means it’s hard to do any kind of cross-vendor or cross-architecture comparison about what they are or how they work. They’re highly implementation dependent.

In the GPUs I work on, there’s not really a blurred line between the actual L1 and the register file. There’s not even just one register file. Sometimes you also get an L3!

These kinds of implementation specific details are where GPUs find a lot of their PPA today, but they’re (arguably sadly) usually quite opaque to the programmer or enthusiastic architecture analyst.

discuss

No comments yet.