top | item 39892862

(no title)

treffer | 1 year ago

A nice example of this is fftw which has hundreds (if not thousands) of generated methods to do the fft math. The whole project is a code generator.

It can then after compilation benchmark these, generate a wisdom file for the hardware and pick the right implementation.

Compared with that "a few" implementations of the core math kernel seem like an easy thing to do.

discuss

mananaysiempre|1 year ago

Metalibm[1,2] is a different idea, but kind of related: if you need a special (trigonometric, exponential, ...) function only with limited accuracy or only on a specific domain, you can have an approximation generated for your specific needs.

[1] https://github.com/metalibm/metalibm

[2] https://indico.cern.ch/event/166141/sessions/125685/attachme...

bee_rider|1 year ago

ATLAS was an automatically tuned BLAS, but it’s been mostly supplanted by ones using the hand-tuned kernel strategy.

touisteur|1 year ago

Apache TVM does something similar for auto-optimization and last time I checked it wasn't always a win against OpenVINO (depending on the network and batch-size) and it came with lots of limitations (which may have been lifted since) - stuff like dynamic batch size.

I wish we had superoptom

naasking|1 year ago

Not exactly comparable, as you said, the FFTW implementations are auto-generated but it doesn't sound like these few implementations will be.