top | item 39892862

(no title)

treffer | 1 year ago

A nice example of this is fftw which has hundreds (if not thousands) of generated methods to do the fft math. The whole project is a code generator.

It can then after compilation benchmark these, generate a wisdom file for the hardware and pick the right implementation.

Compared with that "a few" implementations of the core math kernel seem like an easy thing to do.

discuss

order

bee_rider|1 year ago

ATLAS was an automatically tuned BLAS, but it’s been mostly supplanted by ones using the hand-tuned kernel strategy.

touisteur|1 year ago

Apache TVM does something similar for auto-optimization and last time I checked it wasn't always a win against OpenVINO (depending on the network and batch-size) and it came with lots of limitations (which may have been lifted since) - stuff like dynamic batch size.

I wish we had superoptom

naasking|1 year ago

Not exactly comparable, as you said, the FFTW implementations are auto-generated but it doesn't sound like these few implementations will be.