Eight more months of agents

entropyneur|20 days ago

> I deeply appreciate hand-tool carpentry and mastery of the art, but people need houses and framing teams should obviously have skillsaws.

Where are all the new houses? I admit I am not a bleeding edge seeker when it comes to software consumption, but surely a 10x increase in the industry output would be noticeable to anyone?

amarble|19 days ago

This weekend I tried what I'd call a medium scale agentic coding project[0], following what Anthropic demonstrated last week autonomously building a C-compiler [1]. Bottom line is, it's possible to make demos that look good, but it really doesn't work well enough to build software you would actually use. This naturally lends itself to the "everybody is taking about how great it is but nobody is building anything real with it" construct we're in right now. It is great, but also not really useful.

[0] https://www.marble.onl/posts/this_cost_170.html

[1] https://www.anthropic.com/engineering/building-c-compiler

xyzzy123|20 days ago

Org processes have not changed. Lots of the devs I know are enjoying the speedup on mundane work, consuming it as a temporary lifestyle surplus until everything else catches up.

You can't saw faster than the wood arrives. Also the layout of the whole job site is now wrong and the council approvals were the actual bottleneck to how many houses could be built in the first place... :/

arrrg|20 days ago

To me the hard problem isn’t building things, it’s knowing what to build (finding the things that provide value) and how to build it (e.g. finding novel approaches to doing something that makes something possible that wasn’t possible before).

I don’t see AI helping with knowing what to build at all and I also don’t see AI finding novel approaches to anything.

Sure, I do think there is some unrealized potential somewhere in terms of relatively low value things nobody built before because it just wasn’t worth the time investment – but those things are necessarily relatively low value (or else it would have been worth it to build it) and as such also relatively limited.

Software has amazing economies of scale. So I don’t think the builder/tool analogy works at all. The economics don’t map. Since you only have to build software once and then it doesn’t matter how often you use it (yeah, a simplification) even pretty low value things have always been worth building. In other words: there is tons of software out there. That’s not the issue. The issue is: what it the right software and can it solve my problems?

wongarsu|19 days ago

At my $work this manifests as more backlog items being ticked off, more one-off internal tooling, features (and tools) getting more bells-and-whistles and much more elaborate UI. Also some long-standing bugs being fixed by claude code.

Headline features aren't much faster. You still need to gather requirements, design a good architecture, talk with stakeholders, test your implementation, gather feedback, etc. Speeding up the actual coding can only move the needle so much.

Cthulhu_|20 days ago

I'm sure there's plenty of new software being released and built by agents, but the same problem as handcrafted software remains - finding an audience. The easier and quicker it is to build software, or the more developers build software, the more stuff is thrown at a wall to see what sticks, but I don't think there's more capacity for sticktivity, if my analogy hasn't broken down by now.

bananaflag|20 days ago

I think, if there were to be a noticeable increase in software quantity due to agentic coding, we should test it by looking into indie games.

conartist6|19 days ago

It's AI features and 10x more bugs. Microsoft is leading the way.

dent9|18 days ago

It's ironic to me because I'm the Luddite who refuses to adopt agentic AI and still using only the Chat interface with Codex and Claude inside the VS Code extensions to help me with both work projects and personal projects. And I've had amazing results with only this. "Look at this codebase and tell me the best ways to integrate some new feature", "look at this source code file and tell me what's wrong with it", "show me how to implement this thing I want". Then I copy and adapt the results as needed and integrate it with the rest of my work. This has worked great and I've shipped a ton of projects much faster and easier. Clearly the AI could have written a lot of it itself but I'm not sure I'm really lacking in any benefits with this method. So this makes the whole agentic push especially seem like some kinda over hyped gimmick.

sdoering|20 days ago

Quite a few - and I know I am only speaking for myself - live on my different computers. I created a few CLI tools that make my life and that of my agent smoother sailing for information retrieval. I created, inspired by a blog post, a digital personal assistant, that really enables me to better juggle different work contexts as well as different projects within these work contexts.

I created a platform for a virtual pub quiz for my team at my day job, built multiple pandingpages for events, debugged dark table to recognize my new camera (it was to new to be included in the camera.xml file, but the specs were known). I debugged quite a few parts of a legacy shitshow of an application, did a lot of infrastructure optimization and I also created a massive ton of content as a centaur in dialog with the help of Claude Code.

But I don't do "Show HN" posts. And I don't advertise my builds - because other than those named, most are one off things, that I throw away after this one problem was solved.

To me code became way more ephemeral.

But YMMV - and that is a good thing. I also believe that way less people than the hype bubble implies are actually really into hard core usage like Pete Steinberger or Armin Ronacher and the likes.

csande17|20 days ago

It's not 10x, but https://www.ft.com/content/5ac2ee5f-f8bd-4f39-a759-3c5c50c8b... has some graphs suggesting a 1.5x increase in metrics like "number of new apps published in the iOS App Store" and "lines of code committed by US GitHub users".

People haven't noticed because the software industry was already mostly unoriginal slop, even prior to LLMs, and people are good at ignoring unoriginal slop.

arghwhat|19 days ago

The real outcome is mostly a change in workflow and a reasonable increase in throughput. There might be a 10x or even 100x increase in creation of tiny tools or apps (yay to another 1000 budget assistant/egg timer/etc. apps on the app/play store), but hardly something one would notice.

To be honest, I think the surrounding paragraph lumps together all anti-AI sentiments.

For example, there is a big difference between "all AI output is slop" (which is objectively false) and "AI enables sloppy people to do sloppy work" (which is objectively true), and there's a whole spectrum.

What bugs me personally is not at all my own usage of these tools, but the increase in workload caused by other people using these tools to drown me in nonsensical garbage. In recent months, the extra workload has far exceeded my own productivity gains.

For the non-technical, imagine a hypochondriac using chatgpt to generate hundreds of pages of "health analysis" that they then hand to their doctor and expect a thorough read and opinion of, vs. the doctor using chatgpt for sparring on a particular issue.

PlatoIsADisease|20 days ago

You aren't notcing it?

Small and mid sized companies are getting custom software now.

Small software is able to be packed with extra features instead of bare minimum.

8note|19 days ago

do you have the right expectations?

rather than new stuff for everyone to use, the future could easily be everyone building their own bespoke tools for their own problems.

nathias|19 days ago

yes, there is a very large increase in TUI tools

n4pw01f|19 days ago

I am seeing shit tons of chatbots for everything under the sun being onboarded at corporate

xyzsparetimexyz|20 days ago

> Pay through the nose for Opus or GPT-7.9-xhigh-with-cheese. Don't worry, it's only for a few years.

> You have to turn off the sandbox, which means you have to provide your own sandbox. I have tried just about everything and I highly recommend: use a fresh VM.

> I am extremely out of touch with anti-LLM arguments

'Just pay out the arse and run models without a sandbox or in some annoying VM just to see them fail. Wait, some people are against this?'

dude250711|20 days ago

Well, not 'against' per se, just watching LLM-enthusiasts tumble in the mud for now. Though I have heard that if I don't jump into the mud this instance, I will be left behind apparently for some reason. So you either get left behind or get a muddy behind, your choice.

skybrian|19 days ago

I'm doing web development in a VM (on exe.dev) and it works quite nicely.

CGamesPlay|19 days ago

In case you missed the several links, exe.dev is his startup which provides sandboxing for agents. So it makes sense he wants to get people used to paying for agents and in need of a good sandbox.

happytoexplain|20 days ago

I don't trust the idea of "not getting", "not understanding", or "being out of touch" with anti-LLM (or pro-LLM) sentiment. There is nothing complicated about this divide. The pros and cons are both as plain as anything has ever been. You can disagree - even strongly - with either side. You can't "not understand".

moron4hire|20 days ago

Yeah, "not understanding" means they aren't engaging with the issue honestly. They go on to compare to carpentry, which is a classic sign the speaker understands neither carpentry or software development.

The anti-LLM arguments aren't just "hand tools are more pure." I would even say that isn't even a majority argument. There are plenty more arguments to make about environmental and economic sustainability, correctness, safety, intellectual property rights, and whether there are actual productivity gains distinguishable from placebo.

It's one of the reasons why "I am enjoying programming again" is such a frustrating genre of blog post right now. Like, I'm soooo glad we could fire up some old coal plants so you could have a little treat, Brian from Middle Management.

slfnflctd|20 days ago

> There is nothing complicated about this divide [...] You can't "not understand"

I beg to differ. There are a whole lot of folks with astonishingly incomplete understanding about all the facts here who are going to continue to make things very, very complicated. Disagreement is meaningless when the relevant parties are not working from the same assumption of basic knowledge.

derefr|20 days ago

The negative impacts of generative AI are most sharply being felt by "creatives" (artists, writers, musicians, etc), and the consumers in those markets. If the OP here is 1. a programmer 2. who works solely with other programmers and 3. who is "on the grind", mostly just consuming non-fiction blog-post content related to software development these days, rather than paying much attention to what's currently happening to the world of movies/music/literature/etc... then it'd be pretty easy for them to not be exposed very much to anti-LLM sentiment, since that sentiment is entirely occurring in these other fields that might have no relevance to their (professional or personal) life.

"Anti-LLM sentiment" within software development is nearly non-existent. The biggest kind of push-back to LLMs that we see on HN and elsewhere, is effectively just pragmatic skepticism around the effectiveness/utility/ROI of LLMs when employed for specific use-cases. Which isn't "anti-LLM sentiment" any more than skepticism around the ability of junior programmers to complete complex projects is "anti-junior-programmer sentiment."

The difference between the perspectives you find in the creative professions vs in software dev, don't come down to "not getting" or "not understanding"; they really are a question of relative exposure to these pro-LLM vs anti-LLM ideas. Software dev and the creative professions are acting as entirely separate filter-bubbles of conversation here. You can end up entirely on the outside of one or the other of them by accident, and so end up entirely without exposure to one or the other set of ideas/beliefs/memes.

(If you're curious, my own SO actually has this filter-bubble effect from the opposite end, so I can describe what that looks like. She only hears the negative sentiment coming from the creatives she follows, while also having to dodge endless AI slop flooding all the marketplaces and recommendation feeds she previously used to discover new media to consume. And her job is one you do with your hands and specialized domain knowledge; so none of her coworkers use AI for literally anything. [Industry magazines in her field say "AI is revolutionizing her industry" — but they mean ML, not generative AI.] She has no questions that ChatGPT could answer for her. She doesn't have any friends who are productively co-working with AI. She is 100% out-of-touch with pro-LLM sentiment.)

unknown|20 days ago

[deleted]

joefourier|20 days ago

The author is correct in that agents are becoming more and more capable and that you don't need the IDE to the same extent, but I don't see that as good. I find that IDE-based agentic programming actually encourages you to read and understand your codebase as opposed to CLI-based workflows. It's so much easier to flip through files, review the changes it made, or highlight a specific function and give it to the agent, as opposed to through the CLI where you usually just give it an entire file by typing the name, and often you just pray that it manages to find the context by itself. My prompts in Cursor are generally a lot more specific and I get more surgical results than with Claude Code in the terminal purely because of the convenience of the UX.

But secondly, there's an entire field of LLM-assisted coding that's being almost entirely neglected and that's code autocomplete models. Fundamentally they're the same technology as agents and should be doing the same thing: indexing your code in the background, filtering the context, etc, but there's much less attention and it does feel like the models are stagnating.

I find that very unfortunate. Compare the two workflows:

With a normal coding agent, you write your prompt, then you have to at least a full minute for the result (generally more, depending on the task), breaking your flow and forcing you to task-switch. Then it gives you a giant mass of code and of course 99% of the time you just approve and test it because it's a slog to read through what it did. If it doesn't work as intended, you get angry at the model, retry your prompt, spending a larger amount of tokens the longer your chat history.

But with LLM-powered auto-complete, when you want, say, a function to do X, you write your comment describing it first, just like you should if you were writing it yourself. You instantly see a small section of code and if it's not what you want, you can alter your comment. Even if it's not 100% correct, multi-line autocomplete is great because you approve it line by line and can stop when it gets to the incorrect parts, and you're not forced to task switch and you don't lose your concentration, that great sense of "flow".

Fundamentally it's not that different from agentic coding - except instead of prompting in a chatbox, you write comments in the files directly. But I much prefer the quick feedback loop, the ability to ignore outputs you don't want, and the fact that I don't feel like I'm losing track of what my code is doing.

coffeefirst|20 days ago

The other thing about non-agent workflows is they’re much, much less compute intensive. This is going to matter.

wavemode|20 days ago

I agree with you wholeheartedly. It seems like a lot of the work on making AI autocomplete better (better indexing, context management, codebase awareness, etc) has stagnated in favor of full-on agentic development, which simply isn't suited for many kinds of tasks.

dagss|20 days ago

    But if you try some penny-saving cheap model like Sonnet [..bad things..]. [Better] pay through the nose for Opus.

After blowing $800 of my bootstrap startup funds for Cursor with Opus for myself in a very productive January I figured I had to try to change things up... so this month I'm jumping between Claude Code and Cursor, sometimes writing the plans and having the conversation in Cursor and dump the implementation plan into Claude.

Opus in Cursor is just so much more responsive and easy to talk to, compared to Opus in Claude.

Cursor has this "Auto" mode which feels like it has very liberal limits (amortized cost I guess) that I'm also trying to use more, but -- I don't really like to flip a coin and if it lands up head then waste half hour discovering the LLM made a mess the LLM and try again forcing the model.

Perhaps in March I'll bite the bullet and take this authors advice.

written-beyond|20 days ago

Just use Codex 5.3 in codex cli, the $20/mo plan is basically limitless at least for me and I keep reasoning efforts high.

You can enjoy it while it lasts, OpenAI is being very liberal with their limits because of CC eating their lunch rn.

dakolli|20 days ago

I promise you you're just going to continue to light money on fire. Don't fall for this token madness, the bigger your project gets, the less capable the llm will get and the more you spend per request on average. This is literally all marketing tricks by inference providers. Save your money and code it yourself, or use very inexpensive llm methods if you must.

I think we are going to start hearing stories of people going into thousands in CC debt because they were essentially gambling with token usage thinking they would hit some startup jackpot.

0xbadcafebee|20 days ago

Local models are decent now. Qwen3 coder is pretty good and decent speed. I use smaller models (qwen2.5:1.5b) with keyboard shortcuts and speech to text to ask for man page entries, and get 'em back faster than my internet connection and a "robust" frontier model does. And web search/RAG hides a multitude of sins.

"Using anything other than the frontier models is actively harmful" - so how come I'm getting solid results from Copilot and Haiku/Flash? Observe, Orient, Decide, Act, Review, Modify, Repeat. Loops with fancy heuristics, optimized prompts, and decent tools, have good results with most models released in the past year.

mattmanser|20 days ago

Have you used the frontier models recently? It's hard to communicate the difference the last 6 months has seen.

We're at the point where copilot is irrelevant. Your way of working is irrelevant. Because that's not how you interact with coding AIs anymore, you're chatting with them about the code outside the IDE.

symfrog|20 days ago

Any sufficiently complicated LLM generated program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of an open source project.

kakacik|20 days ago

We had an effort recently where one much more experienced dev from our company ran Claude on our oldish codebase for one system, with the goal of transforming it into newer structure, newer libraries etc. while preserving various built in functionalities. Not the first time this guy did such a thing and he is supposed to be an expert.

I took a look at the result and its maybe half of stuff missing completely, rest is cryptic. I know that codebase by heart since I created it. From my 20+ years of experience correcting all this would take way more effort than manual rewrite from scratch by a senior. Suffice to say thats not what upper management wants to hear, llm adoption often became one of their yearly targets to be evaluated against. So we have a hammer and looking for nails to bend and crook.

Suffice to say this effort led nowhere since we have other high priority goals, for now. Smaller things here & there, why not. Bigger efforts, so far sawed-off 2-barrel shotgun loaded with buckshot right into both feet.

cactusplant7374|20 days ago

All software before LLMs had a copious number of bugs, many of which were never fixed.

dirkc|20 days ago

> Using anything other than the frontier models is actively harmful

If that is true, why should one invest in learning now rather than waiting for 8 months to learn whatever is the frontier model then?

jonas21|20 days ago

So that you can be using the current frontier model for the next 8 months instead of twiddling your thumbs waiting for the next one to come out?

I think you (and others) might be misunderstanding his statement a bit. He's not saying that using an old model is harmful in the sense that it outputs bad code -- he's saying it's harmful because some of the lessons you learn will be out of date and not apply to the latest models.

So yes, if you use current frontier models, you'll need to recalibrate and unlearn a few things when the next generation comes out. But in the meantime, you will have gotten 8 months (or however long it takes) of value out of the current generation.

fusslo|20 days ago

snarky answer: so you can be that 'AI guy' at your office that everyone avoids in the snackroom

senko|20 days ago

Because you might want to use LLMs now. If not, it's definitely better to not chase the hype - ignore the whole shebang.

But if you do want to use LLMs for coding now, not using the best models just doesn't make sense.

ej88|20 days ago

It's not like you need to take a course. The frontier models are the best, just using them and their harnesses and figuring out what works for your use case is the 'investing in learning'.

unknown|20 days ago

[deleted]

recursive|20 days ago

How could it be actively harmful if it wasn't harmful last month when it was the frontier model?

RGamma|20 days ago

There's not that much learning involved. Modern SOTA models are much more intelligent than what they used to be not long ago. It's quite scary/amazing.

dmk|20 days ago

The real insight buried in here is "build what programmers love and everyone will follow." If every user has an agent that can write code against your product, your API docs become your actual product. That's a massive shift.

anthuswilliams|20 days ago

I'm very much looking forward to this shift. It is SO MUCH more pro-consumer than the existing SaaS model. Right now every app feels like a walled garden, with broken UX, constant redesigns, enormous amounts of telemetry and user manipulation. It feels like every time I ask for programmatic access to SaaS tools in order to simplify a workflow, I get stuck in endless meetings with product managers trying to "understand my use case", even for products explicitly marketed to programmers.

Using agents that interact with APIs represents people being able to own their user experience more. Why not craft a frontend that behaves exactly the the way YOU want it to, tailor made for YOUR work, abstracting the set of products you are using and focusing only on the actual relevant bits of the work you are doing? Maybe a downside might be that there is more explicit metering of use in these products instead of the per-user licensing that is common today. But the upside is there is so much less scope for engagement-hacking, dark patterns, useless upselling, and so on.

13pixels|19 days ago

This extends further than most people realize. If agents are the primary consumers of your product surface, then the entire discoverability layer shifts too. Right now Google indexes your marketing page -- soon the question is whether Claude or GPT can even find and correctly describe what your product does when a user asks.

We're already seeing this with search. Ask an LLM "what tools do X" and the answer depends heavily on structured data, citation patterns, and how well your docs/content map to the LLM's training. Companies with great API docs but zero presence in the training data just won't exist to these agents.

So it's not just "API docs = product" -- it's more like "machine-legible presence = existence." Which is a weird new SEO-like discipline that barely has a name yet.

post-it|20 days ago

> Agent harnesses have not improved much since then. There are things Sketch could do well six months ago that the most popular agents cannot do today.

I think this is a neglected area that will see a lot of development in the near future. I think that even if development on AI models stopped today - if no new model was ever trained again - there are still decades of innovation ahead of us in harnessing the models we already have.

Consider ChatGPT: the first release relied entirely on its training data to answer questions. Today, it typically does a few Google searches and summarizes the results. The model has improved, but so has the way we use it.

tiny-automates|20 days ago

agreed, and i'd go further - the harness is where evaluation actually happens, not in some separate benchmark suite. rhe model doesn't know if it succeeded at a web task. the harness has to verify DOM state, check that the right element was clicked, confirm the page transitioned correctly. right now most harnesses just check "did the model say it was done" which is why pass rates on benchmarks don't translate to production reliability. the interesting harness work is building verification into the loop itself, not as an afterthought.

kevmo314|20 days ago

Really? I hardly think it's neglected. The Claude Code harness is the only reason I come back to it. I've tried Claude via OpenCode or others and it doesn't work as well for me. If anything, I would even argue that prior to 4.6, the main reason Opus 4.5 felt like it improved over months was the harness.

dang|20 days ago

Related. Others?

How I program with agents - https://news.ycombinator.com/item?id=44221655 - June 2025 (295 comments)

hasperdi|20 days ago

> It sounds like someone saying power tools should be outlawed in carpentry.

I see this a lot here

sp33der89|20 days ago

All metaphors break down at a certain point, but power tools and generative AI/LLMs being compared feels like somebody is romanticizing the art of programming a bit too much.

Copyright law, education, just the sheer scale of things changing because of LLMs are some things off the top of my head why "power tools vs carpentry" is a bad analogy.

Keyframe|20 days ago

if that someone is clumsy, had an active war going on against basic tools before, and wandered into the carpentry from completely different area, then power tools might be a bad idea.

unknown|20 days ago

[deleted]

dagss|20 days ago

On HN lately? Haven't seen anything about outlawing. But I see a lot of "powertools don't work and make me slower"

wasmainiac|20 days ago

Yes because A tech-bro AIs dream is hundreds of thousands of developers being let go and replacing them with no code tools.

Sure, replace me with AI, but I better get royalties on my public contributions. I like many other developers have kids and other responsibilities to pay for.

We did not share our work publicly to be replaced. The same way I did not lend my neighbour my car so he could run me over, that was implicit.

webdevver|19 days ago

>I wish I could share this joy with the people who are fearful about the changes agents are bringing.

The 'fear' is about losing ones livelihood and getting locked out of homeownership and financial security. its not complicated. life is actually largely determined by your access to capital, despite whatever fresh coping strategy the afflicted (and the afflicting) like to peddle.

the quality of life versus capital availability is very non-linear. there is a step-change around the $500k mark where you reach 'orbital velocity', where as long as you dont suffer severe misfortune or make mistakes, you will start accelerating upwards (albeit very slowly.)

under that line, you are constantly having to fight 'gravity'.

basically everyone in tech is openly or quietly aiming to get there, and LLMs have made that trek ever more precarious than before.

monus|20 days ago

> Along the way I have developed a programming philosophy I now apply to everything: the best software for an agent is whatever is best for a programmer.

Not a plug but really that’s exactly why we’re building sandboxes for agents with local laptop quality. Starting with remote xcode+sim sandboxes for iOS, high mem sandbox with Android Emulator on GPU accel for Android.

No machine allocation but composable sandboxes that make up a developer persona’s laptop.

If interested, a quick demo here https://www.loom.com/share/c0c618ed756d46d39f0e20c7feec996d

muvaf[at]limrun[dot]com

jmull|19 days ago

Regarding the shift away from time spent on agriculture over the last century or so..

> That was a net benefit to the world, that we all don't have to work to eat.

I’m pretty sure most all of us are still working to have food to eat and shelter for ourselves and our families.

Also, while the on-going industrial and technological revolution has certainly brought benefits, it’s an open question as to whether it will turn out to be a net benefit. There’s a large-scale tragedy of the commons experiment playing out and it’s hard to say what the result will be.

uludag|20 days ago

> I am having more fun programming than I ever have, because so many more of the programs I wish I could find the time to write actually exist. I wish I could share this joy with the people who are fearful about the changes agents are bringing.

It might be just me but this reads as very tone deaf. From my perspective, CEOs are seething at the mouth to make as many developers redundant as possible, not being shy about this desire. (I don't see this at all as inevitable, but tech leaders have made their position clear)

Like, imagine the smugness of some 18th century "CEO" telling an artisan, despite the fact that he'l be resigned to working in horrific conditions at a factory, to not worry and think of all the mass produced consumer goods he may enjoy one day.

It's not at all a stretch of the imagination that current tech workers may be in a very precarious situation. All the slopware in the world wouldn't console them.

overgard|20 days ago

I bought Steve Yegge's "Vibe Coding" book. I think I'm about 1/4th of the way through it or so. One thing that surprised me is there's this naivete on display that workers are going to be the ones to reap the benefits of this. Like, Steve was using an example of being able to direct the agent while doing leisure activities (never mind that Steve is more of an executive/thought leader in this company, and, prior to LLMs, seemed to be out of the business of writing code). That's a nice snapshot of a reality that isn't going to persist..

While the idea of programmers working two hours a day and spending the rest of it with their family seems sunny, that's absolutely not how business is going to treat it.

Thought experiment... CEO has a team of 8 engineers. They do some experiments with AI, and they discover that their engineers are 2x more effective on average . What does the CEO do?

a) Change the workweek to 4 hours a day so that all the engineers have better work/life balance since the same amount of work is being done.

b) Fire half the engineers, make the 4 remaining guys pick up the slack, rinse and repeat until there's one guy left?

Like, come on. There's pushback on this stuff not because the technology is bad, (although it's overhyped), but because the no sane person trusts our current economic system to provide anything resembling humane treatment of workers. The super rich are perfectly fine seeing half the population become unemployed, as far as I can tell, as long as their stock numbers go up.

zerotolerance|16 days ago

There is a lot of "my" floating around in this article. I always love getting peeks into experiences with this sort of thing, but I think the "mys" highlight something I've seen every day. These agents are really great at bespoke personal flows that build up a TON of almost personal tribal knowledge about how things get done if there is any consistency to those flows at all. Doing this in larger theaters is much more difficult because tribal knowledge is death for larger teams. It drives up the cost of everything which is why individuals or extremely new small teams feel so much more productive. Everything is new here and consistency doesn't matter yet.

dsign|20 days ago

Look, I'm very negative about this AI thing. I think there is a great chance it will lead to something terrible and we will all die, or worse. But on the other hand, we are all going to die anyway. Some of us, the lucky ones, will die of a heart attack and will learn of our imminent demise in the second it happens, or not at all. The rest of us will have it worse. It has always been like that, and it has only gotten more devastating since we started wearing clothes and stopped being eaten alive by a savanna crocodile or freezing to death during the first snowfall of winter.

But if AI keeps getting better at code, it will produce entire in-silico simulation workflows to test new drugs or even to design synthetic life (which, again, could make us all die, or worse). Yet there is a tiny, tiny chance we will use it to fix some of the darkest aspects of human existence. I will take that.

xyzsparetimexyz|20 days ago

That's stupid. If you genuinely think that there's a great chance AI will kill us all, you wouldn't spin the wheel just for some small vague chance that it doesn't and something good (what exactly, nobody knows) will happen

habinero|19 days ago

A lot of very silly people have convinced themselves and you of this, but it is not true, was never true, and is never going to be true.

We have a lot of actual problems to deal with that aren't telling ghost stories about sand. Focus on those.

Havoc|20 days ago

Disagree with the point about anything less than opus being harmful to learning.

Much of my learning still requires experimentation - including lots of token volume so hitting limits is a problem.

And secondly I’m looking for workflows that build the thing without needing to be at the absolute edge of the LLM capability. Thats where fragility and unpredictability live. Where a new model with slightly different personality is released and it breaks everything. I’d rather have flow that is simple and idiot proof that doesn’t fall apart at the first sign of non-bleeding edge tokens. That means skipping the gains from something opus could one shot ofc but that’s acceptable to me

gip|20 days ago

> In 2026, I don't use an IDE any more.

I don't think it is the best way to look at it. I think that now every team has the power to build and maintain an internal agent (tool + UX) to manager software products. I don't necessarily think that chat-only is enough except for small projects, so teams will build agent that gives them access to the level of abstraction that works best.

It's a data point but this weekend (e.g. in 2 days) I build a desktop + web agent that is able to help me reason on system design and code. Built with Codex powered by the Codex SDK. It is high quality. I've been a software engineer and director of engineering for 10 years. I'm blown away.

dmos62|20 days ago

Curious what kind of agent did you build? I'm building a programming agent myself, it's intentionally archaic in that you run it by constantly copy-pasting from-to fresh ChatGPT sessions (: I'm finding it challenging to have it do good context management: I'm trying to solve this by declaring parts of code or spec as "collections" with an overview md file attached that acts like a map of why/where/what, but that can't scale indefinitely.

dude250711|20 days ago

> ...and director of engineering for 10 years. I'm blown away.

It's always the CTO types who get most enthusiastic.

sarchertech|20 days ago

I’m not saying this is definitely a bot. However, this is the 7th time I’ve read a post and thought it might be an OpenAI promotion bot, clicked on the username, and noticed that the account was created in 2011.

I have yet to do this and see any other year. Was there someone who bought a ton of accounts in 2011 to farm them out? A data breach? Was 2011 just a very big year for new users? (My own account is from 2011)

nickcw|20 days ago

> Some believe AI Super-intelligence is just around the corner (for good or evil). Others believe we're mistaking philosophical zombies for true intelligence, and speedrunning our own brainrot

Not sure which camp I'm in, but I enjoyed the imagery.

dent9|18 days ago

> I am extremely out of touch with anti-LLM arguments

Wow I know that feel.

I'm here using LLM for daily work and even hobbies in very conservative manners and didn't think much of it.

Now when I have casual discussions with other folks, especially non-tech people, the visceral hatred I get for even mentioning AI and the fact that I use it is insane. There's like an entire sub group of people who are so out of touch with these tools they think they're the devil like the anti-GMO crazies and the PETA psychos.

mtlynch|19 days ago

> Along the way I have developed a programming philosophy I now apply to everything: the best software for an agent is whatever is best for a programmer.

I agree with this and I think it's funny to see people publish best practices for working with AI that are like, "Write a clear spec. Have a style guide. Use automated tests."

I'm not convinced it's 100% true because I think there are code patterns that AI handles better than humans and vice versa. But I think it's true enough to use as a guiding philosophy.

dmos62|20 days ago

> the best software for an agent is whatever is best for a programmer

My conclusion as well. It feels paradoxical, maybe because on some level I still think of an LLM as some weird gadget, not a coworker. Context ephemerality is more or less the only veritable difference from a human programmer, I'd say. And, even then, context introduction with LLMs is a speedrun of how you'd do it with new human members of a project. Awesome times we live in.

conartist6|19 days ago

IDEs are going to come roaring back.

As the author says, there's nothing wrong with the idea of the IDE. Of course you want to be using the best, most powerful tools!

AI showed us that our current-gen text-editor-first IDEs are massively underserving the needs of the public, yes, but it didn't really solve that problem. We still need better IDEs! What has changed is that we now understand how badly we need them. (source: I am an IDE author)

jFriedensreich|20 days ago

Its funny how many variations of meaning people assign to agent related terms. Conflating agent with cli and as opposite spectrum of ide is a new one i did not encounter before. I run agents with vscode-server also in a vm and would not give up the ability to have a proper gui anytime i feel like and also being able to switch seamless between more autonomous operation and more interactive seems useful at any level.

piokoch|20 days ago

"In 2026, I don't use an IDE any more."

Just a question? What IDE feature is obsolete now? Ability to navigate the code? Integration with database, Docker, JIRA, Github (like having PR comments available, listed, etc), Git? Working with remote files? Building the project?

Yes, I can ask copilot to build my project and verify tests results, but it will eat a lot of tokens and added value is almost none.

Expurple|19 days ago

> I can ask copilot to build my project and verify tests results, but [..] added value is almost none.

The added value is that it can iterate autonomously and finish tasks that it can't one-shot in its first code edit. Which is basically all tasks that I assign to Copilot.

The added value is that I get to review fully-baked PRs that meet some bar of quality. Just like I don't review human PRs if they don't pass CI.

Fully agree on IDEs, though. I absolutely still need an IDE to iterate on PRs, review them, and tweak them manually. I find VSCode+Copilot to be very good for this workflow. I'm not into vibe coding.

imron|20 days ago

> By far the greatest IDE I have ever used was Visual Studio C++ 6.0 on Windows 2000

Visual C++ 6 was incredible! My favourite IDE of all time too.

EdwardDiego|20 days ago

Poor fellow has never used IntelliJ IDEA.

emmawirt|20 days ago

Curious what you mean by "agent harness" here... are you distinguishing between true autonomous agents (model decides next step) vs workflows that use LLMs at specific nodes? I've found the latter dramatically more reliable for anything beyond prototyping, which makes me wonder if the "model improvement" is partly better prompting and scaffolding.

rahimnathwani|20 days ago

An agent harness is what enables the user to seamlessly interact with both a model and tool calls. Claude Code is an agent harness.

  ┌────────────────────────────┐
  │           User             │
  └──────────────┬─────────────┘
                 │
                 ▼
  ┌────────────────────────────┐
  │       Agent Harness        │
  │   (software interface)     │
  └──────┬──────────────┬──────┘
         │              │
         ▼              ▼
  ┌────────────┐ ┌────────────┐
  │   Models   │ │   Tools    │
  └────────────┘ └────────────┘

Here's an example of a harness with less code: https://github.com/badlogic/pi-mono/blob/fdcd9ab783104285764...

crawshaw|20 days ago

Hi, author here. I mean the piece of code that calls the model and executes the tool calls. My colleague Philip calls it “9 lines of code”: https://sketch.dev/blog/agent-loop

We have built two of them now, and clearly the state of the art here can be improved. But it is hard to push too much on this while the models keep improving.

vedhant|19 days ago

The sandboxing pain is real. Sadly, a new VM seems like the most simple and viable solution. I don't think the masses are doing any sandboxing at all. We really need a sandbox solution that is sort of dynamic and doesn't pester the user with allow/deny requests. It has to be intelligent and keep up with the llm agents.

sandgrownun|19 days ago

I agree with his assessment up until this point in time, it is where we currently are. But it seems to me there is still a large chunk of engineers who don't extrapolate capability out to the engineer being taken out of the loop completely. Imo, it happens in fairly short order. 2-3 years.

symfrog|19 days ago

On what basis are you making that prediction?

63|20 days ago

I have no problem with experienced senior devs using agents to write good code faster. What I have a problem with is inexperienced "vibecoders" who don't care to learn and instead use agents to write awful buggy code that will make the product harder to build on even for the agents. It used to be that lack of a basic understanding of the system was a barrier for people, but now it's not, so we're flooded with code written by imperfect models conducted by people who don't know good from bad.

bdangubic|20 days ago

the number of experienced, senior programmers though, who are in “anti-LLM” camp, is still fairly staggering.

codebolt|20 days ago

Where are you encountering all this slop code? At my work we use LLMs heavily and I don't see this issue. Maybe I'm just lucky that my colleagues all have Uni degrees in CS and at least a few years experience.

remich|20 days ago

We're in a transition phase, but this will shake out in the near future. In the non-professional space, poorly built vibecoded apps simply won't last, for any number of reasons. When it comes to professional devs, this is a problem that is solved by a combination of tooling, process, and management:

(1) Tooling to enable better evaluation of generated code and its adherence to conventions and norms (2) Process to impose requirements on the creation/exposure of PRDs/prompts/traces (3) Management to guide devs in the use of the above and to implement concrete rewards and consequences

Some organizations will be exposed as being deficient in some or all of these areas, and they will struggle. Better organizations will adapt.

gurjeet|19 days ago

> By far the greatest IDE I have ever used was Visual Studio C++ 6.0 on Windows 2000. I have never felt like a toolchain was so complete and consistent with its environment as there.

+1. I've tried many times, and failed, to replicate the joy of using that toolchain.

only2people|20 days ago

>this is why I'm building

My clipart folder of that kid with the lolipop continues to stay relevant

Herring|20 days ago

> In 2000, less than one percent lived on farms and 1% of workers are in agriculture. That was a net benefit to the world, that we all don't have to work to eat.

The jury's still out on that one, because climate change is an existential risk.

gradus_ad|20 days ago

Existential? Maybe to beachfront property owners

thegrim000|19 days ago

"Eight more months of Bitcoin. It's usage continues to dramatically expand. The amount of transactions is increasing exponentially. Soon, fiat currencies will collapse, all replaced by Bitcoin transactions. If you haven't converted your assets over to Bitcoin you're going to be left behind and lose it all. I can't even understand people that don't see the obvious technical superiority of Bitcoin, such people are going to go through rough times."

jeffrallen|20 days ago

Listen to this guy. I've been using his code for a long time, and it works. I am a happy customer of his service, and it works. I listen to his advice and it works.

redkoala|19 days ago

What are those 3 sentences that the author typed to replicate Stripe for his situation?

ares623|20 days ago

guys it's an ad

indigodaddy|20 days ago

Nah, it's not, really

panny|20 days ago

I see a lot of people here saying things like:

>ah, they're so dumb, they don't get it, the anti-LLM people

This is one of the reasons I see AI failing in the short term. If I call you an idiot, are you more or less likely to be open minded and try what I'm selling? AI isn't making money, 95% of companies are failing with AI

https://fortune.com/2025/08/18/mit-report-95-percent-generat...

I mean, your AIs might be a lot more powerful if it was generating money, but that's not happening. I guess being condescending to the 95% of potential buyers isn't really working out.

MrSandingMan|20 days ago

> In 2000, less than one percent lived on farms and 1% of workers are in agriculture. That was a net benefit to the world, that we all don't have to work to eat.

Not obvious

> To me that statement is as obvious as "water is wet".

Well... is water *wet* or does it *wet things*? So not obvious either.

I'm really dubious when reading posts posing some things as obvious or trivial. In general they are not.

Krei-se|20 days ago

The author has a github.

hoistbypetard|20 days ago

> To me that statement is as obvious as "water is wet".

Water is not wet. Water makes things wet. Perhaps the inaccuracy of that statement should be taken as a hint that the other statements that you hold on the same level are worthy of reconsideration.

baq|20 days ago

The good old classic technically correct and completely besides the point observation.

almostdeadguy|20 days ago

In the past couple days I've become less skeptical of the capabilities of LLMs and now more alarmed by them, contra the author. I think if we as a society continue to accept the development of LLMs and the control of them by the major AI companies there will be massively negative repercussions. And I don't mean repercussions like "a rogue AI will destroy humanity" per se, but these things will potentially cause massive social upheaval, a large amount of negative impacts on mental health and cognition, etc. I think if you see LLMs as powerful but not dangerous you are not being honest.

NitpickLawyer|20 days ago

There are some good things here:

First, we currently have 4 frontier labs, and a bunch of 2nd tier ones following. The fact that we don't have just oAI or just Anthropic or just Google is good in the general sense, I would say. The 4 labs racing each other and trading SotA status for ~a few weeks is good for the end consumer. They keep each other honest and keep the prices down. Imagine if Anthropic could charge 60$ /MTok or oAI could charge 120$ /MTok for their gpt4 style models. They can't in good part because of the competition.

Second, there's a bunch of labs / companies that have released and are continuing to release open mdoels. That's as close to "intelligence on tap" as you can get. And those models are ~6-12 months behind the SotA models, depending on your usecase. Even though the labs have largely different incentives to do so, a lot of them are still releasing open models. Hopefully that continues to hold. So not all control will be in the hands of big tech, even if the "best" will still be theirs. At some point "good enough" is fine.

There's also the thing about geopolitics being involved in this. So far we've seen the EU jumping the gun on regulation, and we're kinda sorta paying for it. Everyone is still confused about what can or cannot be done in the EU. The US seems to be waiting to see what happens, and China will do whatever they do. The worst thing that can happen is that at some point the big players (Anthropic is the main driver) push for regulatory capture. That would really suck. Thankfully atm there's this lingering thinking that "if we do it, the others won't so we'll be on the back foot". Hopefully this holds, at least until the "good enough" from above is out :)

gadflyinyoureye|20 days ago

I see them as powerful and dangerous. The goal for decades now is to reduce the human population to 500 million. All human technology was pushed to this end, covertly. If we suddenly have a technology that renders white collar workers useless, we will get to that number faster than expected.

simondoubleu|19 days ago

[deleted]

241 comments