top | item 37557093

(no title)

cec | 2 years ago

Hey, author here. We use machine learning to generate a list of _optimization passes_ for the compiler to run. These optimization passes give us a 3.0% improvement over the default (-Oz), and are what generates correct code. We don't do anything to the code that breaks semantics (assuming no bugs in the compiler passes; some nuance is needed here ;) ).

We also train the model to generate what it thinks the optimized code will look like. We find that this helps the model choose better pass lists, but obviously the code cannot be trusted and semantics are not guaranteed. It only compiles in 91% of cases. "Perfectly emulating the output of the compiler" means the model spat out code that is character-for-character identical to what the compiler generates with the given pass list (even choosing the same variable names etc). IMO this is no mean feat, but is still a long way to go before using LLMs for codegen. We provide a bunch of examples in the paper of things that LLMs can and cannot do.

discuss

Yvan-xy|2 years ago

Hey, I just read through this paper. The phase ordering issues are currently based on heuristics. I noticed that you're only using the instruction count of LLVM as the measurement. However, this metric might not accurately reflect the program's actual performance and code size. LLVM has instructions like GEP that can translate into several lines of assembly code. Additionally, I suggest trying to run some large benchmarks like SPEC to demonstrate the performance benefits of using the LLM.

cec|2 years ago

Hey, yes that's right, and good callout on GEP instructions. We admit that instruction count is a bit handwavy, but we use it as a starting point as that's what the prior works we compare against optimize for. We'll be looking at true binary size next, and code runtime after.