top | item 44064141

(no title)

tmpz22 | 9 months ago

And IMO it has a long way to go. There is a lot of nuance when orchestrating dependencies that can cause subtle errors in an application that are not easily remedied.

For example a lot of llms (I've seen it in Gemini 2.5, and Claude 3.7) will code non-existent methods in dynamic languages. While these runtime errors are often auto-fixable, sometimes they aren't, and breaking out of an agentic workflow to deep dive the problem is quite frustrating - if mostly because agentic coding entices us into being so lazy.

discuss

mikepurvis|9 months ago

"... and breaking out of an agentic workflow to deep dive the problem is quite frustrating"

Maybe that's the problem that needs solving then? The threshold doesn't have to be "bot capable of doing entire task end to end", like it could also be "bot does 90% of task, the worst and most boring part, human steps in at the end to help with the one bit that is more tricky".

Or better yet, the bot is able to recognize its own limitations and proactively surface these instances, be like hey human I'm not sure what to do in this case; based on the docs I think it should be A or B, but I also feel like C should be possible yet I can't get any of them to work, what do you think?

As humans, it's perfectly normal to put up a WIP PR and then solicit this type of feedback from our colleagues; why would a bot be any different?

dvfjsdhgfv|9 months ago

> Maybe that's the problem that needs solving then? The threshold doesn't have to be "bot capable of doing entire task end to end", like it could also be "bot does 90% of task, the worst and most boring part, human steps in at the end to help with the one bit that is more tricky".

Still, the big short-term danger being you're left with code that seems to work well but has subtle bugs in it, and the long-term danger is that you're left with a codebase you're not familiar with.

jasonthorsness|9 months ago

The agents will definitely need a way to evaluate their work just as well as a human would - whether that's a full test suite, tests + directions on some manual verification as well, or whatever. If they can't use the same tools as a human would they'll never be able to improve things safely.

soperj|9 months ago

> if mostly because agentic coding entices us into being so lazy.

Any coding I've done with Claude has been to ask it to build specific methods, if you don't understand what's actually happening, then you're building something that's unmaintainable. I feel like it's reducing typing and syntax errors, sometime it leads me down a wrong path.

weq|9 months ago

I can just imagine it now, you launch your AI coded first product and get a bug in production, and the only way the AI can fix the bug is to re-write and deploy the app with a different library. Your then proceed to show the changelog to the CCB for approval including explaining the fix to the client trying to explain its risk profile for their signoff.

"Yeh, we solved the duplicate name appearing the table issue by moving databases engines and UI frameworks to ones more suited to the task"