(no title)
animaomnium | 1 year ago
I ask because a while back I was messing around with compiling interaction nets to C after reducing as much of the program as possible (without reducing the inputs), as a form of whole program optimization. Wouldn't be too much harder to target a shader language.
Edit: Oh I see...
> This repository provides a low-level IR language for specifying the HVM2 nets, and a compiler from that language to C and CUDA HVM
Will have to look at the code then!
https://github.com/HigherOrderCO/HVM
Edit: Wait nvm, it looks like the HVM2 cuda runtime is an interpreter, that traverses an in-memory graph and applies reductions.
https://github.com/HigherOrderCO/HVM/blob/5de3e7ed8f1fcee6f2...
I was talking about traversing an interaction net to recover a lambda-calculus-like term, which can be lowered to C a la lisp in small pieces with minimal runtime overhead.
Honestly the motivation is, you are unlikely to outperform a hand-written GPU kernel for like ML workloads using Bend. In theory, HVM could act as glue, stitching together and parallelizing the dispatch order of compute kernels, but you need a good FFI to do that. Interaction nets are hard to translate across FFI boundaries. But, if you compile nets to C, keeping track of FFI compute kernel nodes embedded in the interaction network, you can recover a sensible FFI with no translation overhead.
The other option is implementing HVM in hardware, which I've been messing around with on a spare FPGA.
LightMachine|1 year ago
animaomnium|1 year ago
Edit: nvm, I read through the rest of the codebase. I see that HVM compiles the inet to a large static term and then links against the runtime.
https://github.com/HigherOrderCO/HVM/blob/5de3e7ed8f1fcee6f2...
Will have to play around with this and look at the generated assembly, see how much of the runtime a modern c/cu compiler can inline.
Btw, nice code, very compact and clean, well-organized easy to read. Rooting for you!