top | item 47049958

(no title)

homarp | 12 days ago

I really liked this part:

In December 2024, during the frenzied adoption of LLM coding assistants, we became aware that such tools tended—unsurprisingly—to produce Go code in a style similar to the mass of Go code used during training, even when there were newer, better ways to express the same idea. Less obviously, the same tools often refused to use the newer ways even when directed to do so in general terms such as “always use the latest idioms of Go 1.25.” In some cases, even when explicitly told to use a feature, the model would deny that it existed. [...] To ensure that future models are trained on the latest idioms, we need to ensure that these idioms are reflected in the training data, which is to say the global corpus of open-source Go code.

discuss

order

munk-a|12 days ago

PHP went through a similar effort a while back to just clear out places like Stackoverflow of terrible out of date advice (e.g. posts advocating magic_quotes). LLMs make this a slightly different problem because, for the most part, once the bad advice is in the model it's never going away. In theory there's an easier to test surface around how good the advice it's giving is but trying to figure out how it got to that conclusion and correct it for any future models is arcane. It's unlikely that model trainers will submit their RC models to various communities to make sure it isn't lying about those specific topics so everything needs to happen in preparation of the next generation and relying on the hope that you've identified the bad source it originally trained on and that the model will actually prioritize training on that same, now corrected, source.

miki123211|12 days ago

This is one area where reinforcement learning can help.

The way you should think of RL (both RLVR and RLHF) is the "elicitation hypothesis[1]." In pretraining, models learn their capabilities by consuming large amounts of web text. Those capabilities include producing both low and high quality outputs (as both low and high quality outputs are present in their pretraining corpora). In post training, RL doesn't teach them new skills (see E.G. the "Limits of RLVR"[2] paper). Instead, it "teaches" the models to produce the more desirable, higher-quality outputs, while suppressing the undesirable, low-quality ones.

I'm pretty sure you could design an RL task that specifically teaches models to use modern idioms, either as an explicit dataset of chosen/rejected completions (where the chosen is the new way and the rejected is the old), or as a verifiable task where the reward goes down as the number of linter errors goes up.

I wouldn't be surprised if frontier labs have datasets for this for some of the major languages and packages.

[1] https://www.interconnects.ai/p/elicitation-theory-of-post-tr...

[2] https://limit-of-rlvr.github.io

Groxx|12 days ago

They're particularly bad about concurrent go code, in my experience - it's almost always tutorial-like stuff, over-simplified and missing error and edge case handling to the point that it's downright dangerous to use... but it routinely slips past review because it seems simple and simple is correct, right? Go concurrency is so easy!

And then you point out issues in a review, so the author feeds it back into an LLM, and code that looks like it handles that case gets added... while also introducing a subtle data race and a rare deadlock.

Very nearly every single time. On all models.

Jyaif|12 days ago

> a subtle data race and a rare deadlock

That's a langage problem that humans face as well, which golang could stop having (see C++'s Thread Safety annotations).

brightball|12 days ago

Good use case for Elixir. Apparently it performs best across all programming languages with LLM completions and its concurrency model is ideal too.

https://autocodebench.github.io/

robviren|12 days ago

I have run into that a lot which is annoying. Even though all the code compiles because go is backwards compatible it all looks so much different. Same issue for python but in that case the API changes lead to actual breakage. For this reason I find go to be fairly great for codegen as the stability of the language is hard to compete with and the standard lib a powerful enough tool to support many many use cases.

HumblyTossed|12 days ago

The use of LLMs will lead to homogeneous, middling code.

munk-a|12 days ago

Middling code should not exist. Boilerplate code should not exist. For some reason we're suddenly accepting code-gen as SOP instead of building a layer of abstraction on top of the too-onerous layer we're currently building at. Prior generations of software development would see a too-onerous layer and build tools to abstract to a higher level, this generation seems stuck in an idea that we just need tooling to generate all that junk but can continue to work at this level.

cedws|12 days ago

It does. I’ve been writing Go for long enough, and the code that LLMs output is pretty average. It’s what I would expect a mid level engineer to produce. I still write code manually for stuff I care about or where code structure matters.

Maybe the best way is to do the scaffolding yourself and use LLMs to fill the blanks. That may lead to better structured code, but it doesn’t resolve the problem described above where it generates suboptimal or outdated code. Code is a form of communication and I think good code requires an understanding of how to communicate ideas clearly. LLMs have no concept of that, it’s just gluing tokens together. They litter code with useless comments while leaving the parts that need them most without.

bee_rider|12 days ago

Do LLMs generate code similar to middling code of a given domain? Why not generate in a perfect language used only by cool and very handsome people, like Fortran, and then translate it to once the important stuff is done?

shoo|12 days ago

middling code, delivered within a tolerable time frame, budget, without taking excessive risk, is good enough for many real-world commercial software projects. homogeneous middling code, written by humans or extruded by machines, is arguably even a positive for the organisation: lots of organisations are more interested in delivery of software projects being predictable, or having a high bus-factor due to the fungibility of the folks (or machines) building and maintaining the code, rather than depending upon excellence.

saghm|12 days ago

You might even say that LLMs are not capable of understanding a brilliant language but we want to use them to build good software. So, the language that we give them has to be easy for them to understand and easy to adopt.

awesome_dude|12 days ago

I'm not sure if that's a criticism or praise - I mean, most people strive for readable code.

meowface|12 days ago

For a few years, yeah. Eventually it will probably lead to the average quality of code being considerably higher than it was pre-LLMs.

dakolli|12 days ago

I'd prefer we start nuking the idea of using LLMs to write code, not help it get better. Why don't you people listen to Rob Pike, this technology is not good for us. Its a stain on software and the world in general, but I get it most of ya'll yearn for slop. The masses yearn for slop.

throw432196|12 days ago

I totally agree. I read threads like this and I just can’t believe people are wasting their time with LLM’s.

whattheheckheck|12 days ago

The masses yearn to not have to fiddle with bs for rent and food

BiraIgnacio|12 days ago

I definitely see that with C++ code Not so easy to "fix", though. Or so I think. But I do hope still, as more and more "modern" C++ code gets published

yawboakye|11 days ago

battle of my life. several times i’ve had to update my agent instructions to prefer modern and usually better syntax to the old way of doing things. largely it’s worked well for me. i find that making the agents read release notes, and some official blog posts, helps them maintain a healthy and reasonably up-to-date instructions on writing go.