(no title)
frogblast | 5 months ago
An optimization with a universal >=0 speedup across your entire suite of tests is a really hard thing to come by. Something is always going to have a negative speedup.
My experience is with non-Nvidia GPU systems, but this feels like a familiar situation. They probably found something that has great outcomes for one set of kernels, terrible outcomes for another, and no known reliable heuristic or modeling they could use to automatically choose.
Eridrus|5 months ago
rcoveson|5 months ago
godelski|5 months ago
It's easy to over simplify a problem and not even realize you have done so. There's always assumptions being made and you should not let these be invisible.