(no title)
homarp | 12 days ago
In December 2024, during the frenzied adoption of LLM coding assistants, we became aware that such tools tended—unsurprisingly—to produce Go code in a style similar to the mass of Go code used during training, even when there were newer, better ways to express the same idea. Less obviously, the same tools often refused to use the newer ways even when directed to do so in general terms such as “always use the latest idioms of Go 1.25.” In some cases, even when explicitly told to use a feature, the model would deny that it existed. [...] To ensure that future models are trained on the latest idioms, we need to ensure that these idioms are reflected in the training data, which is to say the global corpus of open-source Go code.
munk-a|12 days ago
miki123211|12 days ago
The way you should think of RL (both RLVR and RLHF) is the "elicitation hypothesis[1]." In pretraining, models learn their capabilities by consuming large amounts of web text. Those capabilities include producing both low and high quality outputs (as both low and high quality outputs are present in their pretraining corpora). In post training, RL doesn't teach them new skills (see E.G. the "Limits of RLVR"[2] paper). Instead, it "teaches" the models to produce the more desirable, higher-quality outputs, while suppressing the undesirable, low-quality ones.
I'm pretty sure you could design an RL task that specifically teaches models to use modern idioms, either as an explicit dataset of chosen/rejected completions (where the chosen is the new way and the rejected is the old), or as a verifiable task where the reward goes down as the number of linter errors goes up.
I wouldn't be surprised if frontier labs have datasets for this for some of the major languages and packages.
[1] https://www.interconnects.ai/p/elicitation-theory-of-post-tr...
[2] https://limit-of-rlvr.github.io
jpalepu|12 days ago
[deleted]
Groxx|12 days ago
And then you point out issues in a review, so the author feeds it back into an LLM, and code that looks like it handles that case gets added... while also introducing a subtle data race and a rare deadlock.
Very nearly every single time. On all models.
unknown|12 days ago
[deleted]
Jyaif|12 days ago
That's a langage problem that humans face as well, which golang could stop having (see C++'s Thread Safety annotations).
brightball|12 days ago
https://autocodebench.github.io/
robviren|12 days ago
HumblyTossed|12 days ago
munk-a|12 days ago
cedws|12 days ago
Maybe the best way is to do the scaffolding yourself and use LLMs to fill the blanks. That may lead to better structured code, but it doesn’t resolve the problem described above where it generates suboptimal or outdated code. Code is a form of communication and I think good code requires an understanding of how to communicate ideas clearly. LLMs have no concept of that, it’s just gluing tokens together. They litter code with useless comments while leaving the parts that need them most without.
bee_rider|12 days ago
shoo|12 days ago
saghm|12 days ago
awesome_dude|12 days ago
meowface|12 days ago
throwaway613746|12 days ago
[deleted]
dakolli|12 days ago
throw432196|12 days ago
whattheheckheck|12 days ago
BiraIgnacio|12 days ago
yawboakye|11 days ago