top | item 47079363

(no title)

qrios | 11 days ago

Works on my computer: RTX 3090, CUDA 12.6

Interesting project! I haven't really worked with Vulkan myself yet. Hence my question: how is the code compiled and then loaded into the cores?

Or is the entire code always compiled in the REPL and then uploaded, with only the existing data addresses being updated?

discuss

mr_octopus|11 days ago

Thanks for trying it! :)

Each gpu_* call emits SPIR-V and dispatches via Vulkan compute. Data stays resident in VRAM between calls — no round-trips to CPU unless you need the result.

No thread_id exposed. The runtime handles thread indexing internally — gpu_add(a, b) means "one thread per element, each does a[i] + b[i]." Workgroup sizing and dispatch dimensions are automatic.

The tradeoff: you can't write custom kernels with shared memory or warp-level ops. OctoFlow targets the 80% of GPU work that's embarrassingly parallel. For the other 20% you still want CUDA/Vulkan directly.

Cheers