(no title)
tannedNerd | 1 month ago
Yes opus 4.5 seems great but most of the time it tries to vastly over complicate a solution. Its answer will be 10x harder to maintain and debug than the simpler solution a human would have created by thinking about the constraints of keeping code working.
structural|1 month ago
And it turns out the quality of output you get from both the humans and the models is highly correlated with the quality of the specification you write before you start coding.
Letting a model run amok within the constraints of your spec is actually great for specification development! You get instant feedback of what you wrongly specified or underspecified. On top of this, you learn how to write specifications where critical information that needs to be used together isn't spread across thousands of pages - thinking about context windows when writing documentation is useful for both human and AI consumers.
sksishbs|1 month ago
I can’t get past that by the time I write up an adequate spec and review the agents code, I probably could have done it myself by hand. It’s not like typing was even remotely close to the slow part.
AI, agents, etc are insanely useful for enhancing my knowledge and getting me there faster.
ncruces|1 month ago
pseudosavant|1 month ago
Stuff that seems basic, but that I haven't always been able to count on in my teams' "production" code.
jonas21|1 month ago
tannedNerd|1 month ago
maherbeg|1 month ago
Over time, I imagine even cloud providers, app stores etc can start doing automated security scanning for these types of failure modes, or give a more restricted version of the experience to ensure safety too.
afavour|1 month ago
usefulposter|1 month ago
bgirard|1 month ago
I predict in 2026 we're going to see agents get better at running their own QA, and also get better at not just disabling failing tests. We'll continue to see advancements that will improve quality.
zamalek|1 month ago
cyberpunk|1 month ago
layer8|1 month ago
vbezhenar|1 month ago
LatencyKills|1 month ago
adriand|1 month ago
Maintain and debug by who? It's just going to be Opus 4.5 (and 4.6...and 5...etc.) that are maintaining and debugging it. And I don't think it minds, and I also think it will be quite good at it.
aschobel|1 month ago
something like code-simplifier is surprisingly useful (as is /review)
https://x.com/bcherny/status/2007179850139000872
joelthelion|1 month ago
mikert89|1 month ago