My original question is to understand why it is considered as huge tolerance and what should be considered low tolerance. I am suspecting the paper’s intention is not to compare apples and oranges. They are trying to optimize fp32 baseline by sometime resorting using fp16 as long as the resultant solution’s numerical accuracy is within thr tolerance level. They are going for the “low hanging fruits” type of optimization.
No comments yet.