top | item 44562228

(no title)

antimora | 7 months ago

I'm one of the regular code reviewers for Burn (a deep learning framework in Rust). I recently had to close a PR because the submitter's bug fix was clearly written entirely by an AI agent. The "fix" simply muted an error instead of addressing the root cause. This is exactly what AI tends to do when it can't identify the actual problem. The code was unnecessarily verbose and even included tests for muting the error. Based on the person's profile, I suspect their motivation was just to get a commit on their record. This is becoming a troubling trend with AI tools.

discuss

order

dawnerd|7 months ago

That's what I love about LLMs. You can spot it doesn't know the answer, tell it that it's wrong and it'll go, "You're absolutely right. Let me actually fix it"

It scares me how much code is being produced by people without enough experience to spot issues or people that just gave up caring. We're going to be in for wild ride when all the exploits start flowing.

cogman10|7 months ago

My favorite LLM moment. I wrote some code, asked the LLM "Find any bugs or problems with this code" and of course what it did was hyperfocus on an out of date comment (that I didn't write). Since the problem no longer existed identified in the comment, the LLM just spat out like 100 lines of garbage to refactor the code.

rectang|7 months ago

> "You're absolutely right."

I admit a tendency to anthropomorphize the LLM and get irritated by this quirk of language, although it's not bad enough to prevent me from leveraging the LLM to its fullest.

The key when acknowledging fault is to show your sincerity through actual effort. For technical problems, that means demonstrating that you have worked to analyze the issue, take corrective action, and verify the solution.

But of course current LLMs are weak at understanding, so they can't pull that off. I wish that the LLM could say, "I don't know", but apparently the current tech can't know that that it doesn't know.

And so, as the LLM flails over and over, it shamelessly kisses ass and bullshits you about the work its doing.

I figure that this quirk of LLMs will be minimized in the near future by tweaking the language to be slightly less obsequious. Improved modeling and acknowledging uncertainty will be a heavier lift.

colechristensen|7 months ago

I also get things like this from very experienced engineers working outside their area of expertise. It's obviously less of the completely boneheaded suggestion but still doing exactly the wrong thing suggested by AI that required a person to step in and correct.

daxfohl|7 months ago

It'd be nice if github had a feature that updated the issue with this context automatically too, so that if this agent gives up and closes the PR, the next agent doesn't go and do the exact same thing.

candiddevmike|7 months ago

> tell it that it's wrong and it'll go, "You're absolutely right. Let me actually fix it"

...and then it still doesn't actually fix it

Macha|7 months ago

I recently reviewed a MR from a coworker. There was a test that was clearly written by AI, except I guess however he prompted it, it gave some rather poor variable names like "thing1", "thing2", etc. in test cases. Basically, these were multiple permutations of data that all needed to be represented in the result set. So I asked for them to be named distinctively, maybe by what makes them special.

It's clear he just took that feedback and asked the AI to make the change, and it came up with a change that gave them all very long, very unique names, that just listed all the unique properties in the test case. But to the extent that they sort of became noise.

It's clear writing the PR was very fast for that developer, I'm sure they felt they were X times faster than writing it themselves. But this isn't a good outcome for the tool either. And I'm sure if they'd reviewed it to the extent I did, a lot of that gained time would have dissipated.

meindnoch|7 months ago

>a deep learning framework in Rust [...] This is becoming a troubling trend with AI tools.

The serpent is devouring its own tail.

TeMPOraL|7 months ago

OTOH when they'll start getting good AI contributions, then... it'll be too late for us all.

LoganDark|7 months ago

Deep learning can be incredibly cool and not just used for AI slop.

pennomi|7 months ago

This is the most frustrating thing LLMs do. They put wide try:catch structures around the code making it impossible to actually track down the source of a problem. I want my code to fail fast and HARD during development so I can solve every problem immediately.

daxfohl|7 months ago

Seems like there's a need for github to create a separate flow for AI-cretaed PRs. Project maintainers should be able to stipulate rules like this in English, and an AI "pre-reviewer" would check that the AI has followed all these rules before the PR is created, and chat with the AI submitter to resolve any violations. For exceptional cases, a human submitter is required.

Granted, the compute required is probably more expensive than github would offer for free, and IDK whether it'd be within budget for many open-source projects.

Also granted, something like this may be useful for human-sourced PRs as well, though perhaps post-submission so that maintainers can see and provide some manual assistance if desired. (And also granted, in some cases maybe maintainers would want to provide manual assistance to AI submissions, but I expect the initial triaging based on whether it's a human or AI would be what makes sense in most cases).

kfajdsl|7 months ago

This is my number one complaint with LLM produced code too. The worst thing is when it swallows an error to print its own error message with far less info and no traceback.

In my rules I tell it that try catches are completely banned unless I explicitly ask for one (an okay tradeoff, since usually my error boundaries are pretty wide and I know where I want them). I know the context length is getting too long when it starts ignore that.

0xbadcafebee|7 months ago

> The "fix" simply muted an error instead of addressing the root cause.

FWIW, I have seen human developers do this countless times. In fact there are many people in engineering that will argue for these kinds of "fixes" by default. Usually it's in closed-source projects where the shittiness is hidden from the world, but trust me, it's common.

> I suspect their motivation was just to get a commit on their record. This is becoming a troubling trend with AI tools.

There was already a problem (pre-AI) with shitty PRs on GitHub made to try to game a system. Regardless of how they made the change, the underlying problem is a policy one: how to deal with people making shitty changes for ulterior motives. I expect the solution is actually more AI to detect shitty changes from suspicious submitters.

Another solution (that I know nobody's going to go for): stop using GitHub. Back in the "olden times", we just had CVS, mailing lists and patches. You had to perform some effort in order to get to the point of getting the change done and merged, and it was not necessarily obvious afterward that you had contributed. This would probably stop 99% of people who are hoping for a quick change to boost their profile.

nerdjon|7 months ago

I will never forget being in a code review for a upcoming release, there was a method that was... different. Like massively different with no good reason why it was changed as much as it was for such a small addition.

We asked the person why they made the change, and "silence". They had no reason. It became painfully clear that all they did was copy and paste the method into an LLM and say "add this thing" and it spit out a completely redone method.

So now we had a change that no one in the company actually knew just because the developer took a shortcut. (this change was rejected and reverted).

The scariest thing to me is no one actually knowing what code is running anymore with these models having a tendency to make change for the sake of making change (and likely not actually addressing the root thing but a shortcut like you mentioned)

tomrod|7 months ago

As a side question: I work in AI, but mostly python and theory work. How can I best jump into Burn? Rust has been intriguing to me for a long time

lvl155|7 months ago

This is a real problem that’s only going to get worse. With the major model providers basically keeping all the data themselves, I frankly don’t like this trend long term.

doug_durham|7 months ago

You should be rejecting the PR because the fix was insufficient, not because it was AI agent written. Bad code is bad code regardless of the source. I think the fixation on how the code was generated is not productive.

glitchc|7 months ago

No, that's not how code review works. Getting inside the mind of the developer, understanding how they thought about the fix, is critical to the review process.

If an actual developer wrote this code and submitted it willingly, it would either constitute malice, an attempt to sabotage the codebase or inject a trojan, or stupidity, for failing to understand the purpose of the error message. With an LLM we mostly have stupidity. Flagging it as such reveals the source of the stupidity, as LLMs do not actually understand anything.

RobinL|7 months ago

The problem is that code often takes as long to review as to write, and AI potentially reduces the quality bar to pull requests. So maintainers have a problem of lots of low quality PRs that take time to reject

rustyminnow|7 months ago

> You should be rejecting the PR because the fix was insufficient

I mean they probly could've articulated it your way, but I think that's basically what they did... they point out the insufficient "fix" later, but the root cause of the "fix" was blind trust in AI output, so that's the part of the story they lead with.