(no title)
suresk | 1 year ago
Their paper [1] only mentions using PTX in a few areas to optimize data transfer operations so they don't blow up the L2 cache. This makes intuitive sense to me, since the main limitation of the H800 vs H100 is reduced nvlink bandwidth, which would necessitate doing stuff like this that may not be a common thing for others who have access to H100s.
t55|1 year ago