top | item 32917145

(no title)

TomVDB | 3 years ago

Density, I can accept.

But what kind of latency are we talking about here?

CDNA has 16-wide SIMD units that retires 1 64-wide warp instruction every 4 clock cycles.

RDNA has a 32-wide SIMD unit that retires 1 32-wide warp every clock cycle. (It's uncanny how similar it to to Nvidia's Maxwell and Pascal architecture.)

Your 1/4 number makes me think that you're talking about a latency that has nothing to do with reads from memory, but with the rate at which instructions are retired? Or does it have to with the depth of the instruction pipeline? As long as there's sufficient occupancy, a latency difference of a few clock cycles shouldn't mean anything in the context of a thousand clock cycle latency for accessing DRAM?

discuss

dragontamer|3 years ago

> thousand clock cycle latency for accessing DRAM?

That's what's faster.

Vega64 accesses HBM in like 500 nanoseconds. (https://www.reddit.com/r/ROCm/comments/iy2rfw/752_clock_tick...)

RDNA2 accesses GDDR6 in like 200 nanoseconds. (https://www.techpowerup.com/281178/gpu-memory-latency-tested...)

EDIT: So it looks like my memory was bad. I could have sworn RDNA2 was faster (Maybe I was thinking of the faster L1/L2 caches of RDNA?) Either way, its clear that Vega/GCN has much, much worse memory latency. I've updated the numbers above and also edited this post a few times as I looked stuff up.

TomVDB|3 years ago

Thanks for that.

The weird part is that this latency difference has to be due to a terrible MC design by AMD, because there's not a huge difference in latency between any of the current DRAM technologies: the interface between HBM and GDDR (and regular DDR) is different, but the underlying method of accessing the data is similar enough for the access latency to be very similar as well.