top | item 47022885

(no title)

tjansen | 14 days ago

> 10 minute is not the limit for current models. I can have them work for hours on a problem.

Admittedly, I have never tried to run it that long. If 10 minutes are not enough, I check what it is doing and tell it to do what it needs to do differently, or what to look at, or offer to run it with debug logs. Recently, I have also had a case where Opus was working on an issue forever, fixing one issue and thereby introducing another, fix that, only for the original issue to disappear. Then I tried out Codex, and it fixed it at first sight. So changing models can certainly help.

But do you really get a good solution after running it for hours? To me, that sounds like it doesn't understand the issue completely.

discuss

charcircuit|14 days ago

Sometimes it doesn't work or it will give up early, but considering these run when I'm not working it is not a big deal. When it does work I would say that it has figured out that hard part of the solution. I may have to do another prompt to clean it up a bit, but it got the hard work out of the way.

>or offer to run it with debug logs.

Enabling it to add its own debug logs and use a debugger can allow it to do these loops itself and understand where it's going wrong with its current approach.

tjansen|14 days ago

That assumes that it can easily reproduce the issues. But it's not good at interacting with a complex UI like a human user.