top | item 46964975

(no title)

AI's can self-critique via mechanisms like chain of thought or user specified guard rails like a hook that requires the test suite to pass before a task can be considered complete/ready for human review. These can and do result in higher quality code.

Agree that "good code" is vague - it probably always be. But we can still agree that code quality is going up over time without having a complete specification for what defines "good".

discuss

cedws|19 days ago

Unfortunately I can only give anecdotes, but in my experience the LLM's 'thinking' does not lead to code quality improvements in the same way that a programmer thinking for a while would.

In my experience having LLMs write Go, it tends to factor code in not so great way from the start, probably due to lacking the mental model of pieces composing together. Furthermore, once a structure is in place, there doesn't seem to be a trigger point that causes the LLM to step back and think about reorganising the code, or how the code it wants to write could be better integrated into what's already there. It tends to be very biased by the structures that already exist and not really question them.

A programmer might write a function, notice it becoming too long or doing too much, and then decide break it down into smaller subroutines. I've never seen an LLM really do this, they seem biased towards being additive.

I believe good code comes from an intuition which is very hard to convey. Imprinting hard rules into the LLM like 'refactor long functions' will probably just lead to overcorrection and poor results. It needs to build its own taste for good code, and I'm not sure if that's possible with current technology.

simonw|19 days ago

> Furthermore, once a structure is in place, there doesn't seem to be a trigger point that causes the LLM to step back and think about reorganising the code, or how the code it wants to write could be better integrated into what's already there.

Older models did do this, and it sucked. You'd ask for a change to your codebase and they would refactor a chunk of it and make a bunch of other unrelated "improvements" at the same time.

This was frustrating and made for code that was harder to review.

The latest generation of models appear to have been trained not to do that. You ask for a feature, they'll build that feature with the least changes possible to the code.

I much prefer this. If I want the code refactored I'll say to the model "look for opportunities to refactor this" and then it will start suggesting larger changes.

jmalicki|19 days ago

> A programmer might write a function, notice it becoming too long or doing too much, and then decide break it down into smaller subroutines. I've never seen an LLM really do this, they seem biased towards being additive.

The nice thing is a programmer with an LLM just steps in here, and course-corrects, and still has that value add, without taking all the time to write the boilerplate in between.

And in general, the cleaner your codebase the cleaner LLM modifications will be, it does pick up on coding style.