top | item 45579586

(no title)

bangaroo | 4 months ago

"it's better at coding" is not useful information, sorry. i'd love to hear tangible ways it's actually better. does it still succumb to coding itself in circles, taking multiple dependencies to accomplish the same task, applying inconsistent, outdated, or non-idiomatic patterns for your codebase? has compliance with claude.md files and the like actually improved? what is the round trip time like on these improvements - do you have to have a long conversation to arrive at a simple result? does it still talk itself into loops where it keeps solving and unsolving the same problems? when you ask it to work through a complex refactor, does it still just randomly give up somewhere in the middle and decide there's nothing left to do? does it still sometimes attempt to run processes that aren't self-terminating to monitor their output and hang for upwards of ten minutes?

my experience with claude and its ilk are that they are insanely impressive in greenfield projects and collapse in legacy codebases quickly. they can be a force multiplier in the hands of someone who actually knows what they're doing, i think, but the evidence of that even is pretty shaky: https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o...

the pitch that "if i describe the task perfectly in absolute detail it will accomplish it correctly 80% of the time" doesn't appeal to me as a particularly compelling justification for the level of investment we're seeing. actually writing the code is the simplest part of my job. if i've done all the thinking already, i can just write the code. there's very little need for me to then filter that through a computer with an overly-verbose description of what i want.

as for your search results issue: i don't entirely disagree that google is unusable, but having switched to kagi... again, i'm not sure the order of magnitude of complexity of searching via an LLM is justified? maybe i'm just old, but i like a list of documents presented without much editorializing. google has been a user-hostile product for a long time, and its particularly recent quality collapse has been well-documented, but this seems a lot more a story of "a tool we relied on has gotten measurably worse" and not a story of "this tool is meaningfully better at accomplishing the same task." i'll hand it to chatgpt/claude that they are about as effective as google was at directing me to the right thing circa a decade ago, when it was still a functional product - but that brings me back to the point that "man, this is a lot of investment and expense to arrive at the same result way more indirectly."

discuss

order

kasey_junk|4 months ago

You asked for a single anecdote of llms getting better at daily tasks. I provided two. You dismissed them as not valuable _to you_.

It’s fine that your preferences aren’t aligned such that you don’t value the model or improvements that we’ve seen. It’s troubling that you use that to suggest there haven’t been improvements.

bangaroo|4 months ago

you didn't provide an anecdote. you just said "it's better." an anecdote would be "claude 4 failed in x way, and claude 4.5 succeeds consistently." "it is better" is a statement of fact with literally no support.

the entire thrust of my statement was "i only hear nonspecific, vague vibes that it's better with literally no information to support that concept" and you replied with two nonspecific, vague vibes. sorry i don't find that compelling.

"troubling" is a wild word to use in this scenario.