top | item 36404057

(no title)

The tensor core accelerates mostly matrix operations and is the big block you can see has 4 per SM. Cuda core refers to the thread per SM, which you can see as FP32 or INT32 units, so there are (32*4) per SM on that diagram.

Like you said, tensor core is similar to a special purpose ALU and is at a lower level of abstraction than something with an instruction pointer.

discuss

No comments yet.