top | item 44967179

(no title)

jafioti | 6 months ago

yup! we build a search space by iteratively applying rewrite rules in every possible order (using e-graphs to do this efficiently). the rewrites alter stuff like looping / tiling structures, as well as algebraic rewrites like softmax to online softmax (and then flash attention).

yes optimized kernels for one system will work on other systems with the same hardware. its fine to take a long time compiling if you just compile once and run a lot.

discuss

order

_0ffh|6 months ago

Is/will it be possible to just write a model component with Luminal and then use that as a building block in e.g. Torch or JAX?

almostgotcaught|6 months ago

> take a long time compiling

Lol np-hard is still np-hard no matter how you slice it (especially given vague objective functions).

jafioti|6 months ago

np-hard is still solveable with constraints. look at go.