top | item 43807127

(no title)

imatworkyo | 10 months ago

how often are we truly writing actual novel programs that are complex in a way AI does not excel at?

There are many types of complex, and many times complex for a human coder, are trivial for AI and its skillset.

discuss

gf000|10 months ago

Depends on the field of development you do.

CRUD backend app for a business in a common sector? It's mostly just connecting stuff together (though I would argue that an experienced dev with a good stack takes less time to write it as is than painstakingly explaining it to an LLM in an inexact human language).

Some R&D stuff, or even debugging any kind of code? It's almost useless, as it would require deep reasoning, where these models absolutely break down.

simonw|10 months ago

Have you tried debugging using the new "reasoning" models yet?

I have been extremely impressed with o1, o3, o4-mini and Gemini 2.5 as debugging aids. The combination of long context input and their chain-of-thought means they can frequently help me figure out bugs that span several different layers of code.

I wrote about an early experiment with that here: https://simonwillison.net/2024/Sep/25/o1-preview-llm/

Here's a Gemini 2.5 Pro transcript from this afternoon where I'm trying to figure out a very tricky bug: https://gist.github.com/simonw/4e208ab9edb5e6a814d3d23d7570d...

tyre|10 months ago

Wait I’ve found it very good at debugging. It iteratively states a hypothesis, tries things, and reacts from what it sees.

It thinks of things that I don’t think of right away. It tries weird approaches that are frequently wrong but almost always yield some information and are sometimes spot on.

And sometimes there’s some annoying thing that having Claude bang its head against for $1.25 in API calls is slower than I would be but I can spend my time and emotional bandwidth elsewhere.

expensive_news|10 months ago

I agree with this. I do mostly DevOps stuff for work and it’s great at telling me about errors with different applications/build processes. Just today I used it to help me scrape data from some webpages and it worked very well.

But when I try to do more complicated math it falls short. I do have to say that Gemini Pro 2.5 is starting to get better in this area though.