top | item 47042966

(no title)

debarshri | 13 days ago

This weekend, I found an issue with Microsoft's new Golang version of sqlcmd. Ran Claude code, fixed the issue, which I wouldn't have done if agent stuff did not exist. The fix was contributed back to the project.

I think it is about who is contributing, intention, and various other nuances. I would still say it is net good for the ecosystem.

discuss

order

atomicnumber3|13 days ago

Did you actually fix the issue, or did you fix the issue and introduce new bugs?

The problem is the asymmetry of effort. You verified you fixed your issue. The maintainers verified literally everything else (or are the ones taking the hit if they're just LGTMing it).

Sorry, I am sure your specific change was just fine. But I'm speaking generally.

How many times have I at work looked at a PR and thought "this is such a bad way to fix this I could not have come up with such a comically bad way if I tried." And naturally couldn't say this to my fine coworker whose zeal exceeded his programming skills (partly because someone else had already approved the PR after "reviewing" it...). No, I had to simply fast-follow with my own PR, which had a squashed revert of his change, with the correct fix, so that it didn't introduce race conditions into parallel test runs.

And the submitter of course has no ability to gauge whether their PR is the obvious trivial solution, or comically incorrect. Therein lies the problem.

snovv_crash|13 days ago

This is why open source projects need good architecture and high test coverage.

I'd even argue we need a new type of test coverage, something that traces back the asserts to see what parts of the code are actually constrained by the tests, sort of a differential mutation analysis.

rixed|12 days ago

This could have happened before AI agents though, but yes that's another step in that direction.

mysterydip|13 days ago

I think the problem is determining who is contributing, intention, and those other nuances take a human’s time and effort. And at some point the number of contributions becomes too much to sort through.

debarshri|13 days ago

I think building enough barriers, processes, and mechanisms might work. I don't think it needs to be human effort.

kermatt|13 days ago

If you used Claude to fix the issue, built and tested your branch, and only then submitted the PR, the process is not much is different from pre-LLM days.

I think the problem is where bug-bounty or reputation chasers are letting LLM's write the PRs, _without_ building and testing. They seek output, not outcomes.

softwaredoug|13 days ago

That’s the positive case IMO - a human, you, remain responsible for the fix. It doesn’t matter if AI helped.

The negative case are free running OpenClaw slop cannons that could even be malicious.

_joel|13 days ago

I agree, but that's assuming the project accepts AI generated code, of course. Especially around the legality of accepting commits written by an AI trained on god knows what dataset.

krater23|13 days ago

And are you sure that you fixed it without creating 20 new bugs? For the reader this could mean that you never understood the bug, so how you can sure that you've done anything right?

saghm|13 days ago

How do you make sure you don't create bugs in the code you write without an LLM? I imagine for most people, the answer is a combination of self-review and testing. You can just do those same things with code an LLM helps you write and at that point you have the same level of confidence.

debarshri|13 days ago

Pretty much sure did not create bugs. Because I validated it thoroughly, as I had to deploy it into production in a fintech environment.

So I am pretty much confident as well as convinced about the change. But then I know what I know.

Aurornis|13 days ago

Using an LLM as an assistant isn’t necessarily equivalent to not understanding the output. A common use case of LLMs is to quickly search codebases and pinpoint problems.

mycall|13 days ago

Code complexity is often the cause for more bugs. Complexity naturally comes from more code. It is not uncommon. As they say, the best code I ever wrote was no code.

silverwind|13 days ago

If the test coverage is good it will most likely be fine.