top | item 40203597

(no title)

Vuizur | 1 year ago

>executing large-scale changes in entire repositories in 3 years

You can look at SWE-Agent, it solved 12 percent of the GitHub issues of their test dataset. It probably depends on your definition of large-scale.

This will get much better, it is a new problem with lots of unexplored details, and we will likely get GPT-5 this year, which is supposed to be a similar jump in performance as from 3.5 to 4 according to Altman.

discuss

order

krainboltgreene|1 year ago

This is a laughable definition of large-scale. It's also a misrepresentation of that situation: It was 12% of issues in a dataset for the top 5000 repositories pypy packages. Further "solves" is a incredibly generous definition, so I'm assuming you didn't read the source or any of the attempts to use this service. Here's one where it deletes half the code and replaces network handling with a comment to handle network handling: https://github.com/TBD54566975/tbdex-example-android/pull/14...

"this will get much better" is the statement I've been hearing for the past year and a half. I heard it 2 years ago about the metaverse. I heard it 3 years ago about DAOs. I heard it 5 years about block chains...

What I do see is a lot more lies. Turns out things are zooming along at the speed of light if you only read headlines from sponsored posts.

rsynnott|1 year ago

> Here's one where it deletes half the code and replaces network handling with a comment to handle network handling

... Wait, that's not one that they considered a _success_, is it? Like, one of the 12%?