top | item 46716534

(no title)

coffee_am | 1 month ago

On the other side of the equation I've been spending much more time on code-review on an open source project I maintain, because developers are much more productive and I still code-review at the same speed.

The real issue is that I can't trust the AI generated code, or trust the AI to code-review for me. Some repeated issues I see:

- In my experience the AI doesn't integrate well with the code that there is already there: it often rewrites functionality and tend not to adhere to the project's conventions, but rather use what it is trained on.

- The AI often lacks depth into more complex issues. And because it doesn't see the broader implication of changes, it often doesn't write the tests that would cover them. Developers that wrote the PRs accept the AI tests without much investigation into the code-base. Since the changes passes the (also insufficient) tests, they send the PR to code-review.

- With AI I think (?) I'm more often the one careful deep diving into the project and re-designing the generated code in the code-review. In a way it's an indirect re-prompting.

I'm very happy with the increased PRs: they push the project forward, with great ideas of what to implement, and I'm very happy about AI increased productivity. Also, with AI developers are bolder in their contributions.

But this doesn't scale -- or I'll spend all my time code-reviewing :) I hope the AIs get better quickly.

discuss

order

bird0861|1 month ago

With respect to the first issue you raise, I would perhaps start including prompts in comments. This is a little sneaky sure. And maybe explicitly putting them in a markdown would be better. But there's the risk that markdown won't be loaded. Perhaps it might be possible to inject the file into context via a comment, I've never tried that though and I doubt every assistant will act in a consistent way. The comment method is probably the best bet IMO.

Forgive me because this is a bit of a tangential rant on the second issue, but Gemini Pro 3 was absolutely heinous about this so I cancelled my sub. I'm completely puzzled what it's supposed to be good for.

To your third issue, you should maybe consider building a dataset from those interactions... you might be able to train a LoRA on them and use it as a first pass before you lift a finger to scroll through a PR.

I think a really big issue is that there is a lack of consistency in the use of AI for SWE. There are a lot of models and poorly designed agents/assistants with really unforgivable performance and people just blindly using them without caring about the outputs amounts to something that is kind of Denial-of-Service-y and I keep seeing this issue be raised over and over again.

At the risk of sounding elitist, the world might be a better place for project maintainers when the free money stops rolling into the frontier labs to offer anyone and everyone free use of the models...never give a baby powertools and so on.

dragonwriter|1 month ago

That basically matches, in broad outline, what I see from AI use in an enterprise environment; absent a radical change, I think the near term impact of AI on software development is going to be too increase velocity while shifting workload to less (but not zero) code writing and more code reviewing and knowing when you need to prompt-and-review vs. hand-code.