top | item 46692735

(no title)

fotcorn | 1 month ago

I used Claude Opus 4.5 inside Cursor to write RISC-V Vector/SIMD code. Specifically Depthwise Convolution and normal Convolution layers for a CNN.

I started out by letting it write a naive C version without intrinsic, and validated it against the PyTorch version.

Then I asked it (and two other models, Gemini 3.0 and GPT 5.1) to come up with some ideas on how to make it faster using SIMD vector instructions and write those down as markdown files.

Finally, I started the agent loop by giving Cursor those three markdown files, the naive C code and some more information on how to compile the code, and also an SSH command where it can upload the program and test it.

It then tested a few different variants, ran it on the target (RISC-V SBC, OrangePI RV2) to check if it improves runtime, and then continue from there. It did this 10 times, until it arrived at the final version.

The final code is very readable, and faster than any other library or compiler that I have found so far. I think the clear guardrails (output has to match exactly the reference output from PyTorch, performance must be better than before) makes this work very well.

discuss

sifar|1 month ago

I am really surprised by this. While I know it can generate correct SIMD code, getting a performant version is non trivial, especially for RVV, where the instruction choices and the underlying micro architecture would significantly impact the performance.

IIRC, Depthwise is memory bound so the bar might be lower. Perhaps you can try some thing with higher compute intensity like a matrix multiply. I have observed, it trips up with the columnar accesses for SIMD.

fotcorn|1 month ago

I think the ability to actually run the code on the target helped a lot with understanding and optimizing for the specific micro architecture. Quite a few of the ideas turned out to not to be optimal and were discarded.

Also important to have a few test cases the agent can quickly check against, it will often generate wrong code, but if that is easily detectable the agent can fix it and continue quickly.

camel-cdr|1 month ago

can you share the code?