(no title)
varunshenoy | 2 years ago
You can calculate the SRAM as follows: an A100 has 108 SMs, and each SM has 192 KB in SRAM (shared memory, aka its L1 cache) [1]. Multiplied out, this is ~20 MB of total SRAM. This happens to match up with the diagram in the Flash Attention paper [2].
[1] https://developer.nvidia.com/blog/cuda-refresher-cuda-progra...
No comments yet.