top | item 47044845

(no title)

sureglymop | 12 days ago

Here's an interesting thing. I decided to do advent of code in assembly last year. What I noticed is that there must be a lot of code and binaries in AI training data but not a lot of intermediate representation. Be it LLVM IR, assembly or other forms of IR, it seems underrepresented. LLMs kept trying to give me code patterns that would make sense for high level code but not really for assembly because by hand one could find much more optimized solutions there.

But coincidentally this seems like an easy win for generated training data. Take all your code and have a compiler spit out assembly as well as binary. Now your LLM will not only be able to be a compiler but also make that useful and understandable by humans.

discuss

order

No comments yet.