This is such a lovely balanced thoughtful refreshingly hype-free post to read. 2025 really was the year when things shifted and many first-rate developers (often previously AI skeptics, as Mitchell was) found the tools had actually got good enough that they could incorporate AI agents into their workflows.
It's a shame that AI coding tools have become such a polarizing issue among developers. I understand the reasons, but I wish there had been a smoother path to this future. The early LLMs like GPT-3 could sort of code enough for it to look like there was a lot of potential, and so there was a lot of hype to drum up investment and a lot of promises made that weren't really viable with the tech as it was then. This created a large number of AI skeptics (of whom I was one, for a while) and a whole bunch of cynicism and suspicion and resistance amongst a large swathe of developers. But could it have been different? It seems a lot of transformative new tech is fated to evolve this way. Early aircraft were extremely unreliable and dangerous and not yet worthy of the promises being made about them, but eventually with enough evolution and lessons learned we got the Douglas DC-3, and then in the end the 747.
If you're a developer who still doesn't believe that AI tools are useful, I would recommend you go read Mitchell's post, and give Claude Code a trial run like he did. Try and forget about the annoying hype and the vibe-coding influencers and the noise and just treat it like any new tool you might put through its paces. There are many important conversations about AI to be had, it has plenty of downsides, but a proper discussion begins with close engagement with the tools.
Architects went from drawing everything on paper, to using CAD products over a generation. That's a lot of years! They're still called architects.
Our tooling just had a refresh in less than 3 years and it leaves heads spinning. People are confused, fighting for or against it. Torn even between 2025 to 2026. I know I was.
People need a way to describe it from 'agentic coding' to 'vibe coding' to 'modern AI assisted stack'.
We don't call architects 'vibe architects' even though they copy-paste 4/5th of your next house and use a library of things in their work!
We don't call builders 'vibe builders' for using earth-moving machines instead of a shovel...
When was the last time you reviewed the machine code produced by a compiler? ...
The real issue this industry is facing, is the phenomenal speed of change. But what are we really doing? That's right, programming.
I skimmed over it, and didn’t find any discussion of:
- Pull requests
- Merge requests
- Code review
I feel like I’m taking crazy pills. Are SWE supposed to move away from code review, one of the core activities for the profession? Code review is as fundamental for SWE as double entry is for accounting.
Yes, we know that functional code can get generated at incredible speeds. Yes, we know that apps and what not can be bootstrapped from nothing by “agentic coding”.
We need to read this code, right? How can I deliver code to my company without security and reliability guarantees that, at their core, come from me knowing what I’m delivering line-by-line?
let me ask a stupid/still-ignorant question - about repeatability.
If one asks this generator/assistant same request/thing, within same initial contexts, 10 times, would it generate same result ? in different sessions and all that.
because.. if not, then it's for once-off things only..
Your sentiment resonates with me a lot. I wonder what we’ll consider the inflection point 10 years from now. It seemed like the zeitgeist was screaming about scaling limits and running out of training data, then we got Claude code, sonnet 4.5, then Opus 4.5 and no ones looked back since.
I will give Claude Code a trial run if I can run it locally without an internet connection. AI companies have procured so much training data through illegal means you have to be insane to trust them in even the smallest amount.
Should AI tools use memory safe tabs or spaces for indentation? :)
It is a shame it's become such a polarized topic. Things which actually work fine get immediately bashed by large crowds at the same time things that are really not there get voted to the moon by extremely eager folks. A few years from now I expect I'll be thinking "man, there was some really good stuff I missed out on because the discussions about it were so polarized at the time. I'm glad that has cleared up significantly!"
GPT-4 showed the potential but the automated workflows (context management, loops, test-running) and pure execution speed to handle all that "reasoning"/workflows (remember watching characters pop in slowly in GPT-4 streaming API response calls) are gamechangers.
The workflow automation and better (and model-directed) context management are all obvious in retrospect but a lot of people (like myself) were instead focused on IDE integration and such vs `grep` and the like. Maybe multi-agent with task boards is the next thing, but it feels like that might also start to outrun the ability to sensibly design and test new features for non-greenfield/non-port projects. Who knows yet.
I think it's still very valuable for someone to dig in to the underlying models periodically (insomuch as the APIs even expose the same level of raw stuff anymore) to get a feeling for what's reliable to one-shot vs what's easily correctable by a "ran the tests, saw it was wrong, fixed it" loop. If you don't have a good sense of that, it's easy to get overambitious and end up with something you don't like if you're the sort of person who cares at all about what the code looks like.
It is perfectly valid that this issue is polarizing. on the one hand we have blind cargo culters and on the other hand we have "luddites". Being in one or the other "tribe" is cause for getting insulted or called out. Because the cargo culters want everyone to do what they are doing, just like the RTO crowd. The skeptics want to take a more reasonable pace. One side is we are done this is the future and the other side doesn't see the same results happening to them but the cargo culters think in absolutes and 100% only. It is all or nothing. All these other posts waxing and waning and insulting the skeptics are frankly insulting
I'll try it out when it's something I can run locally. I do not pay for subscriptions for software; I do not pay for services as software substitutes; and, I do not rely on things that run on computers I don't control for anything important.
Whether or not LLM coding AIs are useful, they certainly qualifies as "important," because adopting one is disruptive enough that getting rid of it after adopting it would be disruptive as well. I'm not signing on to that if I need to pay a recurring fee for it and/or need to rely on some company deciding to continue maintaining a cloud server running it in perpetuity.
I think for a lot of people the turn off is the constant churn and the hype cycle. For a lot of people, they just want to get things done and not have to constantly keep on top of what's new or SOTA. Are we still using MCPs or are we using Skills now? Not long ago you had to know MCP or you'd be left behind and you definitely need to know MCP UI or you'll be left behind. I think. It just becomes really tiring, especially with all the FUD.
I'm embracing LLMs but I think I've had to just pick a happy medium and stick with Claude Code with MCPs until somebody figures out a legitimate way to use the Claude subscription with open source tools like OpenCode, then I'll move over to that. Or if a company provides a model that's as good value that can be used with OpenCode.
Isn’t there something off about calling predictions about the future, that aren’t possible with current tech, hype? Like people predicted AI agents would be this huge change, they were called hype since earlier models were so unreliable, and now they are mostly right as ai agents work like a mid level engineer. And clearly super human in some areas.
Is there any reason to use Claude Code specifically over Codex or Gemini? I’ve found the both Codex and Gemini similar in results, but I never tried Claude because of I keep hearing usage runs out so fast on pro plans and there’s no free trial for the CLI.
but annoying hype is exactly the issue with AI in my eyes. I get it's a useful tool in moderation and all, but I also experience that management values speed and quantity of delivery above all else, and hype-driven as they are I fear they will run this industry to the ground and we as users and customers will have to deal with the world where software is permanently broken as a giant pile of unmaintainable vibe code and no experienced junior developers to boot.
The Death of the "Stare": Why AI’s "Confident Stupidity" is a Threat to Human Genius
OPINION | THE REALITY CHECK
In the gleaming offices of Silicon Valley and the boardrooms of the Fortune 500, a new religion has taken hold. Its deity is the Large Language Model, and its disciples—the AI Evangelists—speak in a dialect of "disruption," "optimization," and "seamless integration." But outside the vacuum of the digital world, a dangerous friction is building between AI’s statistical hallucinations and the unyielding laws of physics.
The danger of Artificial Intelligence isn't that it will become our overlord; the danger is that it is fundamentally, confidently, and authoritatively stupid.
The Paradox of the Wind-Powered Car
The divide between AI hype and reality is best illustrated by a recent technical "solution" suggested by a popular AI model: an electric vehicle equipped with wind generators on the front to recharge the battery while driving. To the AI, this was a brilliant synergy. It even claimed the added weight and wind resistance amounted to "zero."
To any human who has ever held a wrench or understood the First Law of Thermodynamics, this is a joke—a perpetual motion fallacy that ignores the reality of drag and energy loss. But to the AI, it was just a series of words that sounded "correct" based on patterns. The machine doesn't know what wind is; it only knows how to predict the next syllable.
The Erosion of the "Human Spark"
The true threat lies in what we are sacrificing to adopt this "shortcut" culture. There is a specific human process—call it The Stare. It is that thirty-minute window where a person looks at a broken machine, a flawed blueprint, or a complex problem and simply observes.
In that half-hour, the human brain runs millions of mental simulations. It feels the tension of the metal, the heat of the circuit, and the logic of the physical universe. It is a "Black Box" of consciousness that develops solutions from absolutely nothing—no forums, no books, and no Google.
However, the new generation of AI-dependent thinkers views this "Stare" as an inefficiency. By outsourcing our thinking to models that cannot feel the consequences of being wrong, we are witnessing a form of evolutionary regression. We are trading hard-earned competence for a "Yes-Man" in a box.
The Gaslighting of the Realist
Perhaps most chilling is the social cost. Those who still rely on their intuition and physical experience are increasingly being marginalized. In a world where the screen is king, the person pointing out that "the Emperor has no clothes" is labeled as erratic, uneducated, or naive.
When a master craftsman or a practical thinker challenges an AI’s "hallucination," they aren't met with logic; they are met with a robotic refusal to acknowledge reality. The "AI Evangelists" have begun to walk, talk, and act like the models they worship—confidently wrong, devoid of nuance, and completely detached from the ground beneath their feet.
The High Cost of Being "Authoritatively Wrong"
We are building a world on a foundation of digital sand. If we continue to trust AI to design our structures and manage our logic, we will eventually hit a wall that no "prompt" can fix.
The human brain runs on 20 watts and can solve a problem by looking at it. The AI runs on megawatts and can’t understand why a wind-powered car won't run forever. If we lose the ability to tell the difference, we aren't just losing our jobs—we're losing our grip on reality itself.
> It's a shame that AI coding tools have become such a polarizing issue among developers.
Frankly I'm so tired of the usual "I don't find myself more productive", "It writes soup". Especially when some of the best software developers (and engineers) find many utility in those tools, there should be some doubt growing in that crowd.
I have come to the conclusion that software developers, those only focusing on the craft of writing code are the naysayers.
Software engineers immediately recognize the many automation/exploration/etc boosts, recognize the tools limits and work on improving them.
Hell, AI is an insane boost to productivity, even if you don't have it write a single line of code ever.
But people that focus on the craft (the kind of crowd that doesn't even process the concept of throwaway code or budgets or money) will keep laying in their "I don't see the benefits because X" forever, nonsensically confusing any tool use with vibe coding.
I'm also convinced that since this crowd never had any notion of what engineering is (there is very little of it in our industry sadly, technology and code is the focus and rarely the business, budget and problems to solve) and confused it with architectural, technological or best practices they are genuinely insecure about their jobs because once their very valued craft and skills are diminished they pay the price of never having invested in understanding the business, the domain, processes or soft skills.
> Break down sessions into separate clear, actionable tasks. Don't try to "draw the owl" in one mega session.
This is the key one I think. At one extreme you can tell an agent "write a for loop that iterates over the variable `numbers` and computes the sum" and they'll do this successfully, but the scope is so small there's not much point in using an LLM. On the other extreme you can tell an agent "make me an app that's Facebook for dogs" and it'll make so many assumptions about the architecture, code and product that there's no chance it produces anything useful beyond a cool prototype to show mom and dad.
A lot of successful LLM adoption for code is finding this sweet spot. Overly specific instructions don't make you feel productive, and overly broad instructions you end up redoing too much of the work.
This is actually an aspect of using AI tools I really enjoy: Forming an educated intuition about what the tool is good at, and tastefully framing and scoping the tasks I give it to get better results.
It cognitively feels very similar to other classic programming activities, like modularization at any level from architecture to code units/functions, thoughtfully choosing how to lay out and chunk things. It's always been one of the things that make programming pleasurable for me, and some of that feeling returns when slicing up tasks for agents.
> Break down sessions into separate clear, actionable tasks.
What this misses, of course, is that you can just have the agent do this too. Agent's are great at making project plans, especially if you give them a template to follow.
I actually enjoy writing specifications. So much so that I made it a large part of my consulting work for a huge part of my career. SO it makes sense that working with Gen-AI that way is enjoyable for me.
The more detailed I am in breaking down chunks, the easier it is for me to verify and the more likely I am going to get output that isn't 30% wrong.
> On the other extreme you can tell an agent "make me an app that's Facebook for dogs" and it'll make so many assumptions about the architecture, code and product that there's no chance it produces anything useful beyond a cool prototype to show mom and dad.
Amusingly, this was my experience in giving Lovable a shot. The onboarding process was literally just setting me up for failure by asking me to describe the detailed app I was attempting to build.
Taking it piece by piece in Claude Code has been significantly more successful.
> the scope is so small there's not much point in using an LLM
Actually that's how I did most of my work last year. I was annoyed by existing tools so I made one that can be used interactively.
It has full context (I usually work on small codebases), and can make an arbitrary number of edits to an arbitrary number of files in a single LLM round trip.
For such "mechanical" changes, you can use the cheapest/fastest model available. This allows you to work interactively and stay in flow.
(In contrast to my previous obsession with the biggest, slowest, most expensive models! You actually want the dumbest one that can do the job.)
I call it "power coding", akin to power armor, or perhaps "coding at the speed of thought". I found that staying actively involved in this way (letting LLM only handle the function level) helped keep my mental model synchronized, whereas if I let it work independently, I'd have to spend more time catching up on what it had done.
I do use both approaches though, just depends on the project, task or mood!
This matches my experience, especially "don’t draw the owl" and the harness-engineering idea.
The failure mode I kept hitting wasn’t just "it makes mistakes", it was drift: it can stay locally plausible while slowly walking away from the real constraints of the repo. The output still sounds confident, so you don’t notice until you run into reality (tests, runtime behaviour, perf, ops, UX).
What ended up working for me was treating chat as where I shape the plan (tradeoffs, invariants, failure modes) and treating the agent as something that does narrow, reviewable diffs against that plan. The human job stays very boring: run it, verify it, and decide what’s actually acceptable. That separation is what made it click for me.
Once I got that loop stable, it stopped being a toy and started being a lever. I’ve shipped real features this way across a few projects (a git like tool for heavy media projects, a ticketing/payment flow with real users, a local-first genealogy tool, and a small CMS/publishing pipeline). The common thread is the same: small diffs, fast verification, and continuously tightening the harness so the agent can’t drift unnoticed.
>The failure mode I kept hitting wasn’t just "it makes mistakes", it was drift: it can stay locally plausible while slowly walking away from the real constraints of the repo. The output still sounds confident, so you don’t notice until you run into reality (tests, runtime behaviour, perf, ops, UX).
Yeah I would get patterns where, initial prototypes were promising, then we developed something that was 90% close to design goals, and then as we try to push in the last 10%, drift would start breaking down, or even just forgetting, the 90%.
So I would start getting to 90% and basically starting a new project with that as the baseline to add to.
No harm meant, but your writing is very reminiscent of an LLM. It is great actually, there is just something about it - "it wasn't.. it was", "it stopped being.. and started". Claude and ChatGPT seem to love these juxtapositions. The triplets on every other sentence. I think you are a couple em-dashes away from being accused of being a bot.
These patterns seem to be picking up speed in the general population; makes the human race seem quite easily hackable.
This is the most common answer from people that are rocking and rolling with AI tools but I cannot help but wonder how is this different from how we should have built software all along. I know I have been (after 10+ years…)
1. Write a generic prompts about the project and software versions and keep it in the folder. (I think this getting pushed as SKIILS.md now)
2. In the prompt add instructions to add comments on changes, since our main job is to validate and fix any issues, it makes it easier.
3. Find the best model for the specific workflow. For example, these days I find that Gemini Pro is good for HTML UI stuff, while Claude Sonnet is good for python code. (This is why subagents are getting popluar)
This was a great post, one of the best I've seen on this topic at HN.
But why is the cost never discussed or disclosed in these conversations? I feel like I'm going crazy, there is so much written extolling the virtues of these tools but with no mention of what it costs to run them now. It will surely only get more expensive from here!
> But why is the cost never discussed or disclosed in these conversations?
And not just the monetary cost of accessing the tools, but the amount of time it takes to actually get good results out. I strongly suspect that even though it feels more productive, in many cases things just take longer than they would if done manually.
I think there are really good uses for LLMs, but I also think that people are likely using them in ways that feel useful, but end up being more costly than not.
Indeed, most of us are probably limited with what our companies let us use and also not to mention not everyone can afford to use AI tooling in their own time without thinking about the cost assuming you want to build something your company doesn't claim as their own IP.
The current realistic lower bound for actual work is the $100/€90/month Claude Max ("5x") plan. It allows roughly enough usage for a typical working month (4.25 x 40-50h). "Single-threaded", interactive usage with normal human breaks, sort of.
There are two usage quota windows to be aware of: 5h and 7d. I use https://github.com/richhickson/claudecodeusage (Mac) to keep track of the status. It shows green/yellow/red and a percentage in the menu bar.
the first time I did work as the article suggests I used my monthly allowance in a day.
Apparently out of 3-5k people with access to our AI tools, there's fewer than a handful of us REALLY using it. Most are asking questions in the chatbot style.
Anyway, I had to ask my manager, the AI architect, and the Tooling Manager for approval to increase my quota.
I asked everyone in the chain how much equivalent dollars I am allocated, and how much the increase was and no one could tell me.
Honestly, the costs are so minimal and vary wildly relative to the cost of a developer that it's frankly not worth the discussion...yet. The reality is the standard deviation of cost is going to oscillate until there is a common agreed upon way to use these tools.
I still use the chatbot but like to do it outside-in. Provide what I need, and instruct it to not write any code except the api (signatures of classes, interfaces, hierarchy, essential methods etc). We keep iterating about this until it looks good - still no real code. Then I ask it to do a fresh review of the broad outline, any issues it foresees etc. Then I ask it to write some demonstrator test cases to see how ergonomic and testable the code is - we fine tune the apis but nothing is fleshed out yet. Once this is done, we are done with the most time consuming phase.
After that is basically just asking it to flesh out the layers starting from zero dependencies to arriving at the top of the castle. Even if we have any complexities within the pieces or the implementation is not exactly as per my liking, the issues are localised - I can dive in and handle it myself (most of the time, I don't need to).
I feel like this approach works very well for me having a mental model of how things are connected because the most of the time I spent was spent on that model.
Finally, a step-by-step guide for even the skeptics to try to see what spot the LLM tools have in their workflows, without hype or magic like I vibe-coded an entire OS, and you can too!.
With so much noise in the AI world and constant model updates (just today GPT-5.3-Codex and Claude Opus 4.6 were announced), this was a really refreshing read. It’s easy to relate to his phased approach to finding real value in tooling and not just hype. There are solid insights and practical tips here. I’m increasingly convinced that the best way not to get overwhelmed is to set clear expectations for what you want to achieve with AI and tailor how you use it to work for you, rather than trying to chase every new headline. Very refreshing.
How much does it cost per day to have all these agents running on your computer?
Is your company paying for it or you?
What is your process of the agent writes a piece of code, let's say a really complex recursive function, and you aren't confident you could have come up with the same solution? Do you still submit it?
It's amusing how everyone seems to be going through the same journey.
I do run multiple models at once now. On different parts of the code base.
I focus solely on the less boring tasks for myself and outsource all of the slam dunk and then review. Often use another model to validate the previous models work while doing so myself.
I do git reset still quite often but I find more ways to not get to that point by knowing the tools better and better.
I've been thinking about this as three maturity levels.
Level 1 is what Mitchell describes — AGENTS.md, a static harness. Prevents known mistakes. But it rots. Nobody updates the checklist when the environment changes.
Level 2 is treating each agent failure as an inoculation. Agent duplicates a util function? Don't just fix it — write a rule file: "grep existing helpers before writing new ones." Agent tries to build a feature while the build is broken? Rule: "fix blockers first." After a few months you have 30+ of these. Each one is an antibody against a specific failure class. The harness becomes an immune system that compounds.
Level 3 is what I haven't seen discussed much: specs need to push, not just be read. If a requirement in auth-spec.md changes, every linked in-progress task should get flagged automatically. The spec shouldn't wait to be consulted.
The real bottleneck isn't agent capability — it's supervision cost. Every type of drift (requirements change, environments diverge, docs rot) inflates the cost of checking the agent's work.
i'd bet that above some number there will be contradictions. Things that apply to different semantic contexts, but look same on syntax level (and maybe with various levels of "syntax" and "semantic"). And debugging those is going to be nightmare - same as debugging requirements spec / verification of that
Very much the same experience. But it does not talk much about the project setup and the influence of it on the session success. In the narrow scoped projects it works really well, especially when tests are easy to execute. I found that this approach melts down when facing enterprise software with large repositories and unconventional layouts. Then you need to do a bunch of context management upfront, and verbose instructions for evaluations. But we know what it needs is a refactor thats all.
And the post touches on a next type of a problem, how to plan far ahead of time to utilise agents when you are away. It is a difficult problem but IMO we’re going in a direction of having some sort of shared “templated plans”/workflows and budgeted/throttled task execution to achieve that. It is like you want to give a little world to explore so that it does not stop early, like a little game to play, then you come back in the morning and check how far it went.
I don't understand how Agents make you feel productive. Single/Multiple agents reading specs, specs often produced with agents itself and iterated over time with human in the loop, a lot of reviewing of giant gibberish specs. Never had a clear spec in my life. Then all the dancing for this apperantly new paradigm, of not reviewing code but verifying behaviour, and so many other things. All of this to me is a total UNproductive mess. I use Cursor autocomplete from day one till to this day, I was super productive before LLMs, I'm more productive now, I'm capable, I have experience, product is hard to maintain but customers are happy, management is happy. So I can't really relate anymore to many of the programmers out there, that's sad, I can count on my hands devs that I can talk to that have hard skills and know-how to share instead of astroturfing about AI Agents
To me part of our job has always been about translating garbage/missing specs in something actionnable.
Working with agents don't change this and that's why until PM/business people are able to come up with actual specs, they'll still need their translators.
Furthermore, it's not because the global spec is garbage that you, as a dev, won't come up with clear specs to solve technical issues related to the overall feature asked by stakeholders.
One funny thing I see though, is in the AI presentations done to non-technical people, the advice: "be as thorough as possible when describing what you except the agent to solve!".
And I'm like: "yeah, that's what devs have been asking for since forever...".
In my real life bubble, AI isn't a big deal either, at least for programmers. They tend to be very sceptical about it for many reasons, perceived productivity being only one of them. So, I guess it's much less of a thing than you would expect from media coverage and certain internet communities.
Just because you haven't or you work in a particular way, doesn't mean everyone does things the same way.
Likewise, on your last point, just because someone is using AI in their work, doesn't mean they don't have hard skills and know-how. Author of this article Mitchell is a great example of that - someone who proved to be able to produce great software and, when talking about individuals who made a dent in the industry, definitely had/has an impactful career.
Very much like my experience with Claude. First, gave it some simple tasks (Getting this error on my site, etc) The results were surprisingly good. Then I started giving Claude broader tasks, and learning how to write the prompts. Now I have come to the point where I haven't written a line of code in several weeks (quite a change for someone who learned to program on a Burroughs b5500 back in the 70's) So I guess I am a convert.
There are so many stories about how people use agentic AI but they rarely post how much they spend. Before I can even consider it, I need to know how it will cost me per month. I'm currently using one pro subscription and it's already quite expensive for me. What are people doing, burning hundreds of dollars per month? Do they also evaluate how much value they get out of it?
I quickly run out of the JetBrains AI 35 monthly credits for $300/yr and spending an additional $5-10/day on top of that, mostly for Claude.
I just recently added in Codex, since it comes with my $20/mo subscription to GPT and that's lowering my Claude credit usage significantly... until I hit those limits at some point.
2012 + 300 + 5~200... so about $1500-$1600/year.
It is 100% worth it for what I'm building right now, but my fear is that I'll take a break from coding and then I'm paying for something I'm not using with the subscriptions.
I'd prefer to move to a model where I'm paying for compute time as I use it, instead of worrying about tokens/credits.
I'm a huge believer in AI agent use and even I think this is wrong. It's like saying "always have something compiling" or "make sure your Internet is always downloading something".
The most important work happens when an agent is not running, and if you spend most of your time looking for ways to run more agents you're going to streetlight-effect your way into solving the wrong problems https://en.wikipedia.org/wiki/Streetlight_effect
AI chat for research is great and really helps me.
I just don't need the AI writing code for me, don't see the point. Once I know from the ai chat research what my solution is I can code it myself with the benefit I then understand more what I am doing.
And yes I've tried the latest models! Tried agent mode in copilot! Don't need it!
I will say one thing Claude does is it doesn't run a command until you approve it, and you can choose between a one-time approval and always allowing a command's pattern. I usually approve the simple commands like `zig build test`, since I'm not particularly worried about the test harness. I believe it also scopes file reading by default to the current directory.
I find it interesting that this thread is full of pragmatic posts that seem to honestly reflect the real limits of current Gen-Ai.
Versus other threads (here on HN, and especially on places like LinkedIn) where it's "I set up a pipeline and some agents and now I type two sentences and amazing technology comes out in 5 minutes that would have taken 3 devs 6 months to do".
The comment by user senko [1] links to a post from this same author with an example for a specific coding session that costs $15.98 for 8 hours of work. The example in this post talks about leaving agents running overnight, in which case I'd guess "twice that amount" would be a reasonable approximation.
Or if we assume that the OP can only do 4 hours per sitting (mentioned in the other post) and 8 hours of overnight agents then it would come down to $15.98 * 1.5 * 20 = $497,40 a month (without weekends).
I’ve gone through a similar journey, not in big tech, but in practical business work. We started with quick experiments: generative prompts in internal tooling, a couple of proof-of-concept bots, and integration of recommendations in mobile apps.
What shifted for us was when we stopped experimenting for novelty and started embedding AI where routine work slowed people down. For example, we built an intake assistant for hospitals: guided questions that organize structured history before a doctor sees the patient. At first it felt promising, but adoption only happened when clinic staff saw that it saved them time and didn’t replace their judgment. That forced us to rethink how we framed the feature. It became about support, not replacement.
The real adoption turning point came when non-technical team members began using the tools without hesitation. That’s when it stopped being AI and just became part of workflow.
This perspective is why I think this article is so refreshing.
Craftsmen approach tools differently. They don't expect tools to work for them out-of-the-box. They customize the tool to their liking and reexamine their workflow in light of the tool. Either that or they have such idiosyncratic workflows they have to build their own tools.
They know their tools are custom to _them_. It would be silly to impose that everyone else use their tools-- they build different things!
It's so sad that we're the ones who have to tell the agent how to improve by extending agent.md or whatever. I constantly have to tell it what I don't like or what can be improved or need to request clarifications or alternative solutions.
This is what's so annoying about it. It's like a child that does the same errors again and again.
But couldn't it adjust itself with the goal of reducing the error bit by bit? Wouldn't this lead to the ultimate agent who can read your mind? That would be awesome.
> It's so sad that we're the ones who have to tell the agent how to improve by extending agent.md or whatever.
Your improvement is someone else's code smell. There's no absolute right or wrong way to write code, and that's coming from someone who definitely thinks there's a right way. But it's my right way.
Anyway, I don't know why you'd expect it to write code the way you like after it's been trained on the whole of the Internet & the the RLHF labelers' preferences and the reward model.
Putting some words in AGENTS.md hardly seems like the most annoying thing.
tip: Add a /fix command that tells it to fix $1 and then update AGENTS.md with the text that'd stop it from making that mistake in the future. Use your nearest LLM to tweak that prompt. It's a good timesaver.
While this may be the end goal, I do think humanity needs to take the trip along with AI to this point.
A mind reading ultimate agent sounds more like a deity, and there are more than enough fables warning one not to create gods because things tend to go bad. Pumping out ASI too quickly will cause massive destabilization and horrific war. Not sure who against really either. Could be us humans against the ASI, could be the rich humans with ASI against us. Anyway about it, it would represent a massive change in the world order.
I know I'm in the minority here, but I've been finding AI to be increasingly useless.
I'd already abandoned it for generating code, for all the reasons everyone knows, that don't need to be rehashed.
I was still in the camp of "It's a better google" and can save me time with research.
The issue it, at this point in my career (30+ years) the questions I have are a bit more nuanced and complex. They aren't things like "how do I make a form with React".
I'm working on developing a very high performance peer server that will need to scale up to hundreds of thousands to a million concurrent web socket connections to work as a signaling server for WebRTC connection negotiation.
I wanted to start as simple as possible, so peerjs is attractive. I asked the AI if peerjs peer-server would work with NodeJS's cluster server. It enthusiastically told me it would work just fine and was, in fact, designed for that.
I took a look at the source code, and it looked to me like that was dead wrong. The AI kept arguing with me before finally admitting it was completely wrong. A total waste of time.
Same results asking it how to remove Sophos from a Mac.
Same with legal questions about HOA laws, it just totally hallucinates things thay don't exist.
My wife and I used to use it to try to settle disagreements (i.e
a better google) but amusingly we've both reached a place where we distrust anything it says so much, we're back to sending each other web articles :-)
I'm still pretty excited about the potential use of AI in elementary education, maybe through high school in some cases, but for my personal use, I've been reaching for it less and less.
I can relate as far as asking AI for advice on complex design tasks. The fundamental problem is that it is still basically a pattern matching technology that "speaks before thinking". For shallow problems this is fine, but where it fails is when it a useful response would require it to have analyzed the consequences of what it is suggesting, although (not that it helps) many people might respond in the same way - with whatever "comes to mind".
I used to joke that programming is not a career - it's a disease - since practiced long enough it fundamentally changes the way you think and talk, always thinking multiple steps ahead and the implications of what you, or anyone else, is saying. Asking advice from another seasoned developer you'll get advice that has also been "pre-analyzed", but not from an LLM.
The author makes a point that you should redo every manual commit with AI to align you mental model of actions with how models work. This is something that I’m going to need to try. It’s related to my desire to reduce things like “discovery tax” (the phenomenon whereby a 5 minute agent task is 4 minutes of environment exploration and 1 minute of execution) and makes sure that models get things right the first time around, however, my AI improvement plan didn’t really account for how to improve the model in cases where I ended up manually resolving issues or implementing features.
Some arguments are made about retaining focus and single-mindedness while working on AI. I think these points are important. It’s related to the article on cutting out over-eager orchestration and focusing on validation work (https://sibylline.dev/articles/2026-01-27-stop-orchestrating...). There are a few sides to this covered in the article. You should always have high value task to switch to when the agent is working (instead of scrolling tiktok, instagram,X, youtube, facebook, hackernews .etc). In my case I might try start to read some books that I have on the backburner like Ghost in the Wires. You should disable agent notifications and take control of when you return to check the model context to be less ADHD ridden when programming with agents and actually make meaningful progress on the side task since you only context switch when you are satisfied. The final one is to always have at least one agent and preferably only one agent running in the background. The idea is that always having an agent results in a slow burn of productivity improvements and a process where you can slowly improve the background agent performance. Generally, always having some agent running is a good way to stay on top of what current model capabilities are.
I also really liked the idea of overnight agents for library research, redevelopment of projects to test out new skills, tests and AGENTS.md modifications.
I've been building systems like what the OP is using since gpt3 came out.
This is the honeymoon phase. You're learning the ins and outs of the specific model you're using and becoming more productive. It's magical. Nothing can stop you. Then you might not be improving as fast as you did at the start, but things are getting better every day. Or maybe every week. But it's heaps better than doing it by hand because you have so much mental capacity left.
Then a new release comes up. An arbitrary fraction of your hard earned intuition is not only useless but actively harmful to getting good results with the new models. Worse you will never know which part it is without unlearning everything you learned and starting over again.
I've had to learn the quirks of three generations of frontier families now. It's not worth the hassle. I've gone back to managing the context window in Emacs because I can't be bothered to learn how to deal with another model family that will be thrown out in six months. Copy and paste is the universal interface and being able to do surgery on the chat history is still better than whatever tooling is out there.
Unironically learning vim or Emacs and the standard Unix code tools is still the best thing you can do to level up your llm usage.
First off, appreciate you sharing your perspective. I just have a few questions.
> I've gone back to managing the context window in Emacs because I can't be bothered to learn how to deal with another model family that will be thrown out in six months.
Can you expand more on what you mean by that? I'm a bit of a noob on llm enabled dev work. Do you mean that you will kick off new sessions and provide a context that you manage yourself instead of relying on a longer running session to keep relevant information?
> Unironically learning vim or Emacs and the standard Unix code tools is still the best thing you can do to level up your llm usage.
I appreciate your insight but I'm failing to understand how exactly knowing these tools increases performance of llms. Is it because you can more precisely direct them via prompts?
> This blog post was fully written by hand, in my own words.
This reminded me of back when wysiwyg web editors started becoming a thing, and coders started adding those "Created in notepad" stickers to their webpages, to point out they were 'real' web developers. Fun times.
I recently also reflected on the evolution of my use of ai in programming. Same evolution, other path. If anyone is interested: https://www.asfaload.com/blog/ai_use/
LLMs are not for me. My position is that the advantage we humans have over the
rest of the natural world, is our minds. Our ability to think, create and express ideas
is what separates us from the rest of the animal kingdom. Once we give that over to
"thinking" machines, we weaken ourselves, both individually and as a species.
That said, I've given it a go. I used zed, which I think is a pretty great tool. I
bought a pro subscription and used the built in agent with Claude Sonnet 4.x and Opus.
I'm a Rails developer in my day job, and, like MitchellH and many others, found out
fairly quickly that tasks for the LLM need to be quite specific and discrete. The agent
is great a renames and minor refactors, but my preferred use of the agent was to get it
to write RSpec tests once I'd written something like a controller or service object.
And generally, the LLM agent does a pretty great job of this.
But here's the rub: I found that I was losing the ability to write rspec.
I went to do it manually and found myself trying to remember API calls and approaches
required to write some specs. The feeling of skill leaving me was quite sobering and
marked my abandonment of LLMs and Zed, and my return to neovim, agent-free.
The thing is, this is a common experience generally. If you don't use it, you lose it.
It applies to all things: fitness, language (natural or otherwise), skills of all kinds.
Why should it not apply to thinking itself.
Now you may write me and my experience off as that of a lesser mind, and that you won't
have such a problem. You've been doing it so long that it's "hard-wired in" by now.
Perhaps.
It's in our nature to take the path of least resistance, to seek ease and convenience at
every turn. We've certainly given away our privacy and anonymity so that we can pay for
things with our phones and send email for "free".
LLMs are the ultimate convenience. A peer or slave mind that we can use to do our
thinking and our work for us. Some believe that the LLM represents a local maxima, that
the approach can't get much better. I dunno, but as AI improves, we will hand over more
and more thinking and work to it. To do otherwise would be to go against our very nature
and every other choice we've made so far.
But it's not for me. I'm no MitchellH, and I'm probably better off performing the
mundane activities of my work, as well as the creative ones, so as to preserve my
hard-won knowledge and skills.
YMMV
I'll leave off with the quote that resonates the most with me as I contemplate AI:-
"I say your civilization, because as soon as we started thinking for you,
it really became our civilization, which is, of course, what this is all about."
-- Agent Smith "The Matrix"
I was using it the same way you just described but for C# and Angular and you're spot on. It feels amazing not having to memorize APIs and just let the AI even do code coverage near to 100%, however at some point I began noticing 2 things:
- When tests didn't work I had to check what was going on and the LLMs do cheat a lot with Volkswagen tests, so that began to make me skeptic even of what is being written by the agents
- When things were broken, spaghetti and awful code tends to be written in an obnoxius way it's beyond repairable and made me wish I had done it from scratch.
Thankfully I just tried using agents for tests and not for the actual code, but it makes me think a lot if "vibe coding" really produces quality work.
AI adoption is being heavily pushed at my work and personally I do use it, but only for the really "boilerplate-y" kinds of code I've already written hundreds of times before. I see it as a way to offload the more "typing-intensive" parts of coding (where the bottleneck is literally just my WPM on the keyboard) so I have more time to spend on the trickier "thinking-intensive" parts.
Just wanted to say that was a nice and very grounded write up; and as a result very informative. Thank you. More stuff like this is a breath of fresh air in a landscape that has veered into hyperbole territory both in the for and against ai sides
This gave me a physical flinch. Perhaps this is unfounded, but all this makes me think of is this becoming the norm, millions of people doing this, and us cooking our planet out much faster than predicted.
I think the sweet spot is ai-assisted chat with manual review: readily available, not as costly
agents jump ahead to the point of the user and project being out of control and more expensive
I think a lot of us still hesitate to make that jump; or at least I am not sure of a cost-effective agent approach (I guess I could manually review their output, but I could see it going off track quickly)
I guess I'd like to see more of an exact breakdown of what prompts and tools and AI are used to get ideas on if I'd use that for myself more
Suspect the sweet spot also depends on the objective. If it’s a personal tool where you are the primary user then vibe coding all the way. You can describe requirements precisely and if it breaks there are no angry customers.
Good article! I especially liked the approach to replicate manual commits with the agent. I did not do that when learning but I suspect I'd have been much better off if I had.
I'd be interested to know what agents you're using. You mentioned Claude and GPT in passing, but don't actually talk about which you're using or for which tasks.
> Immediately cease trying to perform meaningful work via a chatbot.
That depends on your budget. To work within my pro plan's codex limits, I attach the codebase as a single file to various chat windows (GPT 5.2 Thinking - Heavy) and ask it to find bugs/plan a feature/etc. Then I copy the dense tasklist from chat to codex for implementation. This reduces the tokens that codex burns.
Also don't sleep on GPT 5.2 Pro. That model is a beast for planning.
Really appreciated this perspective. Much more tempered and less hype than a lot of other articles I see.
I thought the "dont let agents finishing interrupt you" workflow was an interesting point. I've set up chime hooks to basically do the opposite, and this gives me pause to wonder if I'm underestimating the cost of context switching. I'll give it a go.
What a lovely read. Thank you for sharing your experience.
The human-agent relationship described in the article made me wonder: are natural, or experienced, managers having more success with AI as subordinates than people without managerial skill? Are AI agents enormously different than arbitrary contractors half a world away where the only communication is daily text exchanges?
This is yet one more indication to me that the winds have shifted with regards to the utility of the “agent” paradigm of coding with an LLM. With all the talk around Opus 4.5 I decided to finally make the jump there myself and haven’t yet been disappointed (though admittedly I’m starting it on some pretty straightforward stuff).
You mentioned "harness engineering". How do you approach building "actual programmed tools" (like screenshot scripts) specifically for an LLM's consumption rather than a human's? Are there specific output formats or constraints you’ve found most effective?
So does everyone just run with giving full permissions on Claude code these days? It seems like I’m constantly coming back to CC to validate that it’s not running some bash that’s going to nuke my system. I would love to be able to fully step away but it feels like I can’t.
I run my agents with full permissions in containers. Feels like a reasonable tradeoff. Bonus is I can set up each container with exactly the stack needed.
I'm kind of on the same journey, a bit less far along. One thing I have observed is that I am constantly running out of tokens in claude. I guess this is not an issue for a wealthy person like Mitchell but it does significantly hamper my ability to experiment.
Now that the Nasdaq crashes, people switch from the stick to the carrot:
"Please let us sit down and have a reasonable conversation! I was a skeptic, too, but if all skeptics did what I did, they would come to Jesus as well! Oh, and pay the monthly Anthropic tithe!"
> Context switching is very expensive. In order to remain efficient, I found that it was my job as a human to be in control of when I interrupt the agent, not the other way around. Don't let the agent notify you.
Do you have any ideas on how to harness AI to only change specific parts of a system or workpiece? Like "I consider this part 80/100 done and only make 'meaningful' or 'new contributions' here" ...?
For those of working on large proprietary, in fringe languages as well, what can we do? Upload all the source code to the cloud model? I am really wary of giving it a million lines of code it’s never seen.
I've found mostly for context reasons its better to just have a grand overview of the systems and how they work together and feed that to the agent as context, it will use the additional files it touches to expand its understanding if you prompt well.
not quite as technically rich as i came to expect from previous posts from op, but very insightful regardless.
not ashamed to say that i am between steps 2 and 3 in my personal workflow.
>Adopting a tool feels like work, and I do not want to put in the effort
all the different approaches floating online feel ephemeral to me. this, just like for different tools for the op, seem like a chore to adopt. i like the fomo mongering from the community does not help here, but in the end it is a matter of personal discovery to stick with what works for you.
AI is getting to the game-changing point. We need more hand-written reflections on how individuals are managing to get productivity gains for real (not a vibe coded app) software engineering.
I respect Hashimoto for his contributions in the field, but to be honest, I am fed up with posts talking about using AI in ways that are impossible for most people due to high costs. I want to see more posts on cost-effective techniques, rather than just another guy showing off how he turned a creative 'burning-time' hobby into a 'burning-money' one.
> I'm not [yet?] running multiple agents, and currently don't really want to
This is the main reason to use AI agents, though: multitasking. If I'm working on some Terraform changes and I fire off an agent loop, I know it's going to take a while for it to produce something working. In the meantime I'm waiting for it to come back and pretend it's finished (really I'll have to fix it), so I start another agent on something else. I flip back and forth between the finished runs as they notify me. At the end of the day I have 5 things finished rather than two.
The "agent" doesn't have to be anything special either. Anything you can run in a VM or container (vscode w/copilot chat, any cli tool, etc) so you can enable YOLO mode.
If the author is here, please could you also confirm you’ve never been paid by any AI company, marketing representative, community programme, in any shape or form?
He explicitly said "I don't work for, invest in, or advise any AI companies." in the article.
But yes, Hashimoto is a high profile CEO/CTO who may well have an indirect, or near-future interest in talking up AI. HN articles extoling the productivity gains of Claude on HN do generally tend to be from older, managerial types (make of that what you will).
This are all valid points and a hype-free pragmatic take, I've been wondering about the same things even when I'm still in the skeptics side. I think there are other things that should be added since Mitchell's reality won't apply to everyone:
- What about non opensource work that's not on Github?
- Costs! I would think "an agent always running" would add up quickly
- In open source work, how does it amplify others. Are you seeing AI Slop as PRs? Can you tell the difference?
How much electricity (and associated materials like water) must this use?
It makes me profoundly sad to think of the huge number of AI agents running endlessly to produce vibe-coded slop. The environmental impact must be massive.
Keep in mind that these are estimates, but you could attempt to extrapolate from here. Programming prompts probably take more because I assume the average context is a good bit higher than the average ChatGPT question, plus additional agents.
All in, I'm not sure if the energy usage long term is going to be overblown by media or if it'll be accurate. I'm personally not sure yet.
The Death of the "Stare": Why AI’s "Confident Stupidity" is a Threat to Human Genius
OPINION | THE REALITY CHECK
In the gleaming offices of Silicon Valley and the boardrooms of the Fortune 500, a new religion has taken hold. Its deity is the Large Language Model, and its disciples—the AI Evangelists—speak in a dialect of "disruption," "optimization," and "seamless integration." But outside the vacuum of the digital world, a dangerous friction is building between AI’s statistical hallucinations and the unyielding laws of physics.
The danger of Artificial Intelligence isn't that it will become our overlord; the danger is that it is fundamentally, confidently, and authoritatively stupid.
The Paradox of the Wind-Powered Car
The divide between AI hype and reality is best illustrated by a recent technical "solution" suggested by a popular AI model: an electric vehicle equipped with wind generators on the front to recharge the battery while driving. To the AI, this was a brilliant synergy. It even claimed the added weight and wind resistance amounted to "zero."
To any human who has ever held a wrench or understood the First Law of Thermodynamics, this is a joke—a perpetual motion fallacy that ignores the reality of drag and energy loss. But to the AI, it was just a series of words that sounded "correct" based on patterns. The machine doesn't know what wind is; it only knows how to predict the next syllable.
The Erosion of the "Human Spark"
The true threat lies in what we are sacrificing to adopt this "shortcut" culture. There is a specific human process—call it The Stare. It is that thirty-minute window where a person looks at a broken machine, a flawed blueprint, or a complex problem and simply observes.
In that half-hour, the human brain runs millions of mental simulations. It feels the tension of the metal, the heat of the circuit, and the logic of the physical universe. It is a "Black Box" of consciousness that develops solutions from absolutely nothing—no forums, no books, and no Google.
However, the new generation of AI-dependent thinkers views this "Stare" as an inefficiency. By outsourcing our thinking to models that cannot feel the consequences of being wrong, we are witnessing a form of evolutionary regression. We are trading hard-earned competence for a "Yes-Man" in a box.
The Gaslighting of the Realist
Perhaps most chilling is the social cost. Those who still rely on their intuition and physical experience are increasingly being marginalized. In a world where the screen is king, the person pointing out that "the Emperor has no clothes" is labeled as erratic, uneducated, or naive.
When a master craftsman or a practical thinker challenges an AI’s "hallucination," they aren't met with logic; they are met with a robotic refusal to acknowledge reality. The "AI Evangelists" have begun to walk, talk, and act like the models they worship—confidently wrong, devoid of nuance, and completely detached from the ground beneath their feet.
The High Cost of Being "Authoritatively Wrong"
We are building a world on a foundation of digital sand. If we continue to trust AI to design our structures and manage our logic, we will eventually hit a wall that no "prompt" can fix.
The human brain runs on 20 watts and can solve a problem by looking at it. The AI runs on megawatts and can’t understand why a wind-powered car won't run forever. If we lose the ability to tell the difference, we aren't just losing our jobs—we're losing our grip on reality itself.
For the AI skeptics reading this, there is an overwhelming probability that Mitchell is a better developer than you. If he gets value out of these tools you should think about why you can't.
The value Mitchell describes aligns well with the lack of value I'm getting. He feels that guiding an agent through a task is neither faster nor slower than doing it himself, and there's some tasks he doesn't even try to do with an agent because he knows it won't work, but it's easier to parallelize reviewing agentic work than it is to parallelize direct coding work. That's just not a usage pattern that's valuable to me personally - I rarely find myself in a situation where I have large number of well-scoped programming tasks I need to complete, and it's a fun treat to do myself when I do.
Perhaps that's the reason. Maybe I'm just not a good enough developer. But that's still not actionable. It's not like I never considered being a better developer.
Don't get it. What's the relation between Mitchell being a "better" developer than most of us (and better is always relative, but that's another story) and getting value out of AI? That's like saying Bezos is a way better businessman than you, so you should really hear his tips about becoming a billionaire. No sense (because what works for him probably doesn't work for you)
Tons of respect for Mitchell. I think you are doing him a disservice with these kinds of comments.
> babysitting my kind of stupid and yet mysteriously productive robot friend
LOL, been there, done that. It is much less frustrating and demoralizing than babysitting your kind of stupid colleague though. (Thankfully, I don't have any of those anymore. But at previous big companies? Oh man, if only their commits were ONLY as bad as a bad AI commit.)
I think this is something people ignore, and is significant. The only way to get good at coding with LLMs is actually trying to do it. Even if it's inefficient or slower at first. It's just another skill to develop [0].
And it's not really about using all the plugins and features available. In fact, many plugins and features are counter-productive. Just learn how to prompt and steer the LLM better.
Some comments were deferred for faster rendering.
libraryofbabel|24 days ago
It's a shame that AI coding tools have become such a polarizing issue among developers. I understand the reasons, but I wish there had been a smoother path to this future. The early LLMs like GPT-3 could sort of code enough for it to look like there was a lot of potential, and so there was a lot of hype to drum up investment and a lot of promises made that weren't really viable with the tech as it was then. This created a large number of AI skeptics (of whom I was one, for a while) and a whole bunch of cynicism and suspicion and resistance amongst a large swathe of developers. But could it have been different? It seems a lot of transformative new tech is fated to evolve this way. Early aircraft were extremely unreliable and dangerous and not yet worthy of the promises being made about them, but eventually with enough evolution and lessons learned we got the Douglas DC-3, and then in the end the 747.
If you're a developer who still doesn't believe that AI tools are useful, I would recommend you go read Mitchell's post, and give Claude Code a trial run like he did. Try and forget about the annoying hype and the vibe-coding influencers and the noise and just treat it like any new tool you might put through its paces. There are many important conversations about AI to be had, it has plenty of downsides, but a proper discussion begins with close engagement with the tools.
keyle|24 days ago
Our tooling just had a refresh in less than 3 years and it leaves heads spinning. People are confused, fighting for or against it. Torn even between 2025 to 2026. I know I was.
People need a way to describe it from 'agentic coding' to 'vibe coding' to 'modern AI assisted stack'.
We don't call architects 'vibe architects' even though they copy-paste 4/5th of your next house and use a library of things in their work!
We don't call builders 'vibe builders' for using earth-moving machines instead of a shovel...
When was the last time you reviewed the machine code produced by a compiler? ...
The real issue this industry is facing, is the phenomenal speed of change. But what are we really doing? That's right, programming.
datsci_est_2015|24 days ago
Yes, we know that functional code can get generated at incredible speeds. Yes, we know that apps and what not can be bootstrapped from nothing by “agentic coding”.
We need to read this code, right? How can I deliver code to my company without security and reliability guarantees that, at their core, come from me knowing what I’m delivering line-by-line?
svilen_dobrev|24 days ago
If one asks this generator/assistant same request/thing, within same initial contexts, 10 times, would it generate same result ? in different sessions and all that.
because.. if not, then it's for once-off things only..
beoberha|24 days ago
tmtvl|24 days ago
zamadatix|24 days ago
It is a shame it's become such a polarized topic. Things which actually work fine get immediately bashed by large crowds at the same time things that are really not there get voted to the moon by extremely eager folks. A few years from now I expect I'll be thinking "man, there was some really good stuff I missed out on because the discussions about it were so polarized at the time. I'm glad that has cleared up significantly!"
majormajor|24 days ago
The workflow automation and better (and model-directed) context management are all obvious in retrospect but a lot of people (like myself) were instead focused on IDE integration and such vs `grep` and the like. Maybe multi-agent with task boards is the next thing, but it feels like that might also start to outrun the ability to sensibly design and test new features for non-greenfield/non-port projects. Who knows yet.
I think it's still very valuable for someone to dig in to the underlying models periodically (insomuch as the APIs even expose the same level of raw stuff anymore) to get a feeling for what's reliable to one-shot vs what's easily correctable by a "ran the tests, saw it was wrong, fixed it" loop. If you don't have a good sense of that, it's easy to get overambitious and end up with something you don't like if you're the sort of person who cares at all about what the code looks like.
a456463|23 days ago
linuxrocks123|24 days ago
Whether or not LLM coding AIs are useful, they certainly qualifies as "important," because adopting one is disruptive enough that getting rid of it after adopting it would be disruptive as well. I'm not signing on to that if I need to pay a recurring fee for it and/or need to rely on some company deciding to continue maintaining a cloud server running it in perpetuity.
chrysoprace|24 days ago
I'm embracing LLMs but I think I've had to just pick a happy medium and stick with Claude Code with MCPs until somebody figures out a legitimate way to use the Claude subscription with open source tools like OpenCode, then I'll move over to that. Or if a company provides a model that's as good value that can be used with OpenCode.
ianm218|24 days ago
Lich|24 days ago
arcxi|24 days ago
bullshitsite|24 days ago
OPINION | THE REALITY CHECK In the gleaming offices of Silicon Valley and the boardrooms of the Fortune 500, a new religion has taken hold. Its deity is the Large Language Model, and its disciples—the AI Evangelists—speak in a dialect of "disruption," "optimization," and "seamless integration." But outside the vacuum of the digital world, a dangerous friction is building between AI’s statistical hallucinations and the unyielding laws of physics.
The danger of Artificial Intelligence isn't that it will become our overlord; the danger is that it is fundamentally, confidently, and authoritatively stupid.
The Paradox of the Wind-Powered Car The divide between AI hype and reality is best illustrated by a recent technical "solution" suggested by a popular AI model: an electric vehicle equipped with wind generators on the front to recharge the battery while driving. To the AI, this was a brilliant synergy. It even claimed the added weight and wind resistance amounted to "zero."
To any human who has ever held a wrench or understood the First Law of Thermodynamics, this is a joke—a perpetual motion fallacy that ignores the reality of drag and energy loss. But to the AI, it was just a series of words that sounded "correct" based on patterns. The machine doesn't know what wind is; it only knows how to predict the next syllable.
The Erosion of the "Human Spark" The true threat lies in what we are sacrificing to adopt this "shortcut" culture. There is a specific human process—call it The Stare. It is that thirty-minute window where a person looks at a broken machine, a flawed blueprint, or a complex problem and simply observes.
In that half-hour, the human brain runs millions of mental simulations. It feels the tension of the metal, the heat of the circuit, and the logic of the physical universe. It is a "Black Box" of consciousness that develops solutions from absolutely nothing—no forums, no books, and no Google.
However, the new generation of AI-dependent thinkers views this "Stare" as an inefficiency. By outsourcing our thinking to models that cannot feel the consequences of being wrong, we are witnessing a form of evolutionary regression. We are trading hard-earned competence for a "Yes-Man" in a box.
The Gaslighting of the Realist Perhaps most chilling is the social cost. Those who still rely on their intuition and physical experience are increasingly being marginalized. In a world where the screen is king, the person pointing out that "the Emperor has no clothes" is labeled as erratic, uneducated, or naive.
When a master craftsman or a practical thinker challenges an AI’s "hallucination," they aren't met with logic; they are met with a robotic refusal to acknowledge reality. The "AI Evangelists" have begun to walk, talk, and act like the models they worship—confidently wrong, devoid of nuance, and completely detached from the ground beneath their feet.
The High Cost of Being "Authoritatively Wrong" We are building a world on a foundation of digital sand. If we continue to trust AI to design our structures and manage our logic, we will eventually hit a wall that no "prompt" can fix.
The human brain runs on 20 watts and can solve a problem by looking at it. The AI runs on megawatts and can’t understand why a wind-powered car won't run forever. If we lose the ability to tell the difference, we aren't just losing our jobs—we're losing our grip on reality itself.
whatifnomoney|24 days ago
[deleted]
epolanski|24 days ago
Frankly I'm so tired of the usual "I don't find myself more productive", "It writes soup". Especially when some of the best software developers (and engineers) find many utility in those tools, there should be some doubt growing in that crowd.
I have come to the conclusion that software developers, those only focusing on the craft of writing code are the naysayers.
Software engineers immediately recognize the many automation/exploration/etc boosts, recognize the tools limits and work on improving them.
Hell, AI is an insane boost to productivity, even if you don't have it write a single line of code ever.
But people that focus on the craft (the kind of crowd that doesn't even process the concept of throwaway code or budgets or money) will keep laying in their "I don't see the benefits because X" forever, nonsensically confusing any tool use with vibe coding.
I'm also convinced that since this crowd never had any notion of what engineering is (there is very little of it in our industry sadly, technology and code is the focus and rarely the business, budget and problems to solve) and confused it with architectural, technological or best practices they are genuinely insecure about their jobs because once their very valued craft and skills are diminished they pay the price of never having invested in understanding the business, the domain, processes or soft skills.
mjr00|24 days ago
This is the key one I think. At one extreme you can tell an agent "write a for loop that iterates over the variable `numbers` and computes the sum" and they'll do this successfully, but the scope is so small there's not much point in using an LLM. On the other extreme you can tell an agent "make me an app that's Facebook for dogs" and it'll make so many assumptions about the architecture, code and product that there's no chance it produces anything useful beyond a cool prototype to show mom and dad.
A lot of successful LLM adoption for code is finding this sweet spot. Overly specific instructions don't make you feel productive, and overly broad instructions you end up redoing too much of the work.
sho_hn|24 days ago
It cognitively feels very similar to other classic programming activities, like modularization at any level from architecture to code units/functions, thoughtfully choosing how to lay out and chunk things. It's always been one of the things that make programming pleasurable for me, and some of that feeling returns when slicing up tasks for agents.
mapontosevenths|24 days ago
What this misses, of course, is that you can just have the agent do this too. Agent's are great at making project plans, especially if you give them a template to follow.
apercu|24 days ago
The more detailed I am in breaking down chunks, the easier it is for me to verify and the more likely I am going to get output that isn't 30% wrong.
iamacyborg|24 days ago
Amusingly, this was my experience in giving Lovable a shot. The onboarding process was literally just setting me up for failure by asking me to describe the detailed app I was attempting to build.
Taking it piece by piece in Claude Code has been significantly more successful.
oulipo2|24 days ago
But not so good at making (robust) new features out of the blue
jedbrooke|24 days ago
Maybe there’s something about not having to context switch between natural language and code just makes it _feel_ easier sometimes
unknown|24 days ago
[deleted]
kcorbitt|24 days ago
andai|24 days ago
Actually that's how I did most of my work last year. I was annoyed by existing tools so I made one that can be used interactively.
It has full context (I usually work on small codebases), and can make an arbitrary number of edits to an arbitrary number of files in a single LLM round trip.
For such "mechanical" changes, you can use the cheapest/fastest model available. This allows you to work interactively and stay in flow.
(In contrast to my previous obsession with the biggest, slowest, most expensive models! You actually want the dumbest one that can do the job.)
I call it "power coding", akin to power armor, or perhaps "coding at the speed of thought". I found that staying actively involved in this way (letting LLM only handle the function level) helped keep my mental model synchronized, whereas if I let it work independently, I'd have to spend more time catching up on what it had done.
I do use both approaches though, just depends on the project, task or mood!
EastLondonCoder|24 days ago
The failure mode I kept hitting wasn’t just "it makes mistakes", it was drift: it can stay locally plausible while slowly walking away from the real constraints of the repo. The output still sounds confident, so you don’t notice until you run into reality (tests, runtime behaviour, perf, ops, UX).
What ended up working for me was treating chat as where I shape the plan (tradeoffs, invariants, failure modes) and treating the agent as something that does narrow, reviewable diffs against that plan. The human job stays very boring: run it, verify it, and decide what’s actually acceptable. That separation is what made it click for me.
Once I got that loop stable, it stopped being a toy and started being a lever. I’ve shipped real features this way across a few projects (a git like tool for heavy media projects, a ticketing/payment flow with real users, a local-first genealogy tool, and a small CMS/publishing pipeline). The common thread is the same: small diffs, fast verification, and continuously tightening the harness so the agent can’t drift unnoticed.
protocolture|24 days ago
Yeah I would get patterns where, initial prototypes were promising, then we developed something that was 90% close to design goals, and then as we try to push in the last 10%, drift would start breaking down, or even just forgetting, the 90%.
So I would start getting to 90% and basically starting a new project with that as the baseline to add to.
ricardobeat|24 days ago
These patterns seem to be picking up speed in the general population; makes the human race seem quite easily hackable.
bdangubic|24 days ago
miyuru|24 days ago
these are some ticks I use now.
1. Write a generic prompts about the project and software versions and keep it in the folder. (I think this getting pushed as SKIILS.md now)
2. In the prompt add instructions to add comments on changes, since our main job is to validate and fix any issues, it makes it easier.
3. Find the best model for the specific workflow. For example, these days I find that Gemini Pro is good for HTML UI stuff, while Claude Sonnet is good for python code. (This is why subagents are getting popluar)
apitman|24 days ago
kyoji|24 days ago
But why is the cost never discussed or disclosed in these conversations? I feel like I'm going crazy, there is so much written extolling the virtues of these tools but with no mention of what it costs to run them now. It will surely only get more expensive from here!
wtetzner|24 days ago
And not just the monetary cost of accessing the tools, but the amount of time it takes to actually get good results out. I strongly suspect that even though it feels more productive, in many cases things just take longer than they would if done manually.
I think there are really good uses for LLMs, but I also think that people are likely using them in ways that feel useful, but end up being more costly than not.
quarkz14|24 days ago
lysace|24 days ago
There are two usage quota windows to be aware of: 5h and 7d. I use https://github.com/richhickson/claudecodeusage (Mac) to keep track of the status. It shows green/yellow/red and a percentage in the menu bar.
fusslo|24 days ago
Apparently out of 3-5k people with access to our AI tools, there's fewer than a handful of us REALLY using it. Most are asking questions in the chatbot style.
Anyway, I had to ask my manager, the AI architect, and the Tooling Manager for approval to increase my quota.
I asked everyone in the chain how much equivalent dollars I am allocated, and how much the increase was and no one could tell me.
mbesto|24 days ago
senko|24 days ago
noisy_boy|24 days ago
After that is basically just asking it to flesh out the layers starting from zero dependencies to arriving at the top of the castle. Even if we have any complexities within the pieces or the implementation is not exactly as per my liking, the issues are localised - I can dive in and handle it myself (most of the time, I don't need to).
I feel like this approach works very well for me having a mental model of how things are connected because the most of the time I spent was spent on that model.
sho_hn|24 days ago
alterom|24 days ago
scarrilho|24 days ago
i_love_retros|24 days ago
Is your company paying for it or you?
What is your process of the agent writes a piece of code, let's say a really complex recursive function, and you aren't confident you could have come up with the same solution? Do you still submit it?
paracyst|24 days ago
keyle|24 days ago
I do run multiple models at once now. On different parts of the code base.
I focus solely on the less boring tasks for myself and outsource all of the slam dunk and then review. Often use another model to validate the previous models work while doing so myself.
I do git reset still quite often but I find more ways to not get to that point by knowing the tools better and better.
Autocompleting our brains! What a crazy time.
anupamchugh|24 days ago
Level 1 is what Mitchell describes — AGENTS.md, a static harness. Prevents known mistakes. But it rots. Nobody updates the checklist when the environment changes.
Level 2 is treating each agent failure as an inoculation. Agent duplicates a util function? Don't just fix it — write a rule file: "grep existing helpers before writing new ones." Agent tries to build a feature while the build is broken? Rule: "fix blockers first." After a few months you have 30+ of these. Each one is an antibody against a specific failure class. The harness becomes an immune system that compounds.
Level 3 is what I haven't seen discussed much: specs need to push, not just be read. If a requirement in auth-spec.md changes, every linked in-progress task should get flagged automatically. The spec shouldn't wait to be consulted.
The real bottleneck isn't agent capability — it's supervision cost. Every type of drift (requirements change, environments diverge, docs rot) inflates the cost of checking the agent's work.
Crush that cost and adoption follows.
svilen_dobrev|24 days ago
i'd bet that above some number there will be contradictions. Things that apply to different semantic contexts, but look same on syntax level (and maybe with various levels of "syntax" and "semantic"). And debugging those is going to be nightmare - same as debugging requirements spec / verification of that
sublimefire|24 days ago
And the post touches on a next type of a problem, how to plan far ahead of time to utilise agents when you are away. It is a difficult problem but IMO we’re going in a direction of having some sort of shared “templated plans”/workflows and budgeted/throttled task execution to achieve that. It is like you want to give a little world to explore so that it does not stop early, like a little game to play, then you come back in the morning and check how far it went.
hollowturtle|24 days ago
wiether|24 days ago
To me part of our job has always been about translating garbage/missing specs in something actionnable.
Working with agents don't change this and that's why until PM/business people are able to come up with actual specs, they'll still need their translators.
Furthermore, it's not because the global spec is garbage that you, as a dev, won't come up with clear specs to solve technical issues related to the overall feature asked by stakeholders.
One funny thing I see though, is in the AI presentations done to non-technical people, the advice: "be as thorough as possible when describing what you except the agent to solve!". And I'm like: "yeah, that's what devs have been asking for since forever...".
maqnius|24 days ago
elAhmo|24 days ago
Just because you haven't or you work in a particular way, doesn't mean everyone does things the same way.
Likewise, on your last point, just because someone is using AI in their work, doesn't mean they don't have hard skills and know-how. Author of this article Mitchell is a great example of that - someone who proved to be able to produce great software and, when talking about individuals who made a dent in the industry, definitely had/has an impactful career.
novatrope|13 days ago
jonathanstrange|24 days ago
latchkey|24 days ago
I just recently added in Codex, since it comes with my $20/mo subscription to GPT and that's lowering my Claude credit usage significantly... until I hit those limits at some point.
2012 + 300 + 5~200... so about $1500-$1600/year.
It is 100% worth it for what I'm building right now, but my fear is that I'll take a break from coding and then I'm paying for something I'm not using with the subscriptions.
I'd prefer to move to a model where I'm paying for compute time as I use it, instead of worrying about tokens/credits.
JoshuaDavid|24 days ago
causal|24 days ago
I'm a huge believer in AI agent use and even I think this is wrong. It's like saying "always have something compiling" or "make sure your Internet is always downloading something".
The most important work happens when an agent is not running, and if you spend most of your time looking for ways to run more agents you're going to streetlight-effect your way into solving the wrong problems https://en.wikipedia.org/wiki/Streetlight_effect
i_love_retros|24 days ago
I just don't need the AI writing code for me, don't see the point. Once I know from the ai chat research what my solution is I can code it myself with the benefit I then understand more what I am doing.
And yes I've tried the latest models! Tried agent mode in copilot! Don't need it!
underdeserver|24 days ago
That's one very short step removed from Simon Willison's lethal trifecta.
smj-edison|24 days ago
glitchcrab|24 days ago
recursive|24 days ago
apercu|24 days ago
Versus other threads (here on HN, and especially on places like LinkedIn) where it's "I set up a pipeline and some agents and now I type two sentences and amazing technology comes out in 5 minutes that would have taken 3 devs 6 months to do".
munksbeer|24 days ago
simgt|24 days ago
For more on the "harness engineering", see what Armin Ronacher and Mario Zechner are doing with pi: https://lucumr.pocoo.org/2026/1/31/pi/ https://mariozechner.at/posts/2025-11-30-pi-coding-agent/
> I really don't care one way or the other if AI is here to stay3, I'm a software craftsman that just wants to build stuff for the love of the game.
I suspect having three comma on one's bank account helps being very relaxed about the outcome ;)
davidw|24 days ago
I wonder how much all this costs on a monthly basis?
tptacek|24 days ago
probably_wrong|24 days ago
Or if we assume that the OP can only do 4 hours per sitting (mentioned in the other post) and 8 hours of overnight agents then it would come down to $15.98 * 1.5 * 20 = $497,40 a month (without weekends).
[1] https://news.ycombinator.com/item?id=46905872
kajolshah_bt|19 days ago
What shifted for us was when we stopped experimenting for novelty and started embedding AI where routine work slowed people down. For example, we built an intake assistant for hospitals: guided questions that organize structured history before a doctor sees the patient. At first it felt promising, but adoption only happened when clinic staff saw that it saved them time and didn’t replace their judgment. That forced us to rethink how we framed the feature. It became about support, not replacement.
The real adoption turning point came when non-technical team members began using the tools without hesitation. That’s when it stopped being AI and just became part of workflow.
randusername|23 days ago
This perspective is why I think this article is so refreshing.
Craftsmen approach tools differently. They don't expect tools to work for them out-of-the-box. They customize the tool to their liking and reexamine their workflow in light of the tool. Either that or they have such idiosyncratic workflows they have to build their own tools.
They know their tools are custom to _them_. It would be silly to impose that everyone else use their tools-- they build different things!
zubspace|24 days ago
This is what's so annoying about it. It's like a child that does the same errors again and again.
But couldn't it adjust itself with the goal of reducing the error bit by bit? Wouldn't this lead to the ultimate agent who can read your mind? That would be awesome.
audience_mem|24 days ago
Your improvement is someone else's code smell. There's no absolute right or wrong way to write code, and that's coming from someone who definitely thinks there's a right way. But it's my right way.
Anyway, I don't know why you'd expect it to write code the way you like after it's been trained on the whole of the Internet & the the RLHF labelers' preferences and the reward model.
Putting some words in AGENTS.md hardly seems like the most annoying thing.
tip: Add a /fix command that tells it to fix $1 and then update AGENTS.md with the text that'd stop it from making that mistake in the future. Use your nearest LLM to tweak that prompt. It's a good timesaver.
pixl97|24 days ago
A mind reading ultimate agent sounds more like a deity, and there are more than enough fables warning one not to create gods because things tend to go bad. Pumping out ASI too quickly will cause massive destabilization and horrific war. Not sure who against really either. Could be us humans against the ASI, could be the rich humans with ASI against us. Anyway about it, it would represent a massive change in the world order.
cactusplant7374|24 days ago
I also love using it for research for upcoming features. Research + pick a solution + implement. It happens so fast.
claytongulick|24 days ago
I'd already abandoned it for generating code, for all the reasons everyone knows, that don't need to be rehashed.
I was still in the camp of "It's a better google" and can save me time with research.
The issue it, at this point in my career (30+ years) the questions I have are a bit more nuanced and complex. They aren't things like "how do I make a form with React".
I'm working on developing a very high performance peer server that will need to scale up to hundreds of thousands to a million concurrent web socket connections to work as a signaling server for WebRTC connection negotiation.
I wanted to start as simple as possible, so peerjs is attractive. I asked the AI if peerjs peer-server would work with NodeJS's cluster server. It enthusiastically told me it would work just fine and was, in fact, designed for that.
I took a look at the source code, and it looked to me like that was dead wrong. The AI kept arguing with me before finally admitting it was completely wrong. A total waste of time.
Same results asking it how to remove Sophos from a Mac.
Same with legal questions about HOA laws, it just totally hallucinates things thay don't exist.
My wife and I used to use it to try to settle disagreements (i.e a better google) but amusingly we've both reached a place where we distrust anything it says so much, we're back to sending each other web articles :-)
I'm still pretty excited about the potential use of AI in elementary education, maybe through high school in some cases, but for my personal use, I've been reaching for it less and less.
HarHarVeryFunny|23 days ago
I used to joke that programming is not a career - it's a disease - since practiced long enough it fundamentally changes the way you think and talk, always thinking multiple steps ahead and the implications of what you, or anyone else, is saying. Asking advice from another seasoned developer you'll get advice that has also been "pre-analyzed", but not from an LLM.
4corners4sides|23 days ago
Some arguments are made about retaining focus and single-mindedness while working on AI. I think these points are important. It’s related to the article on cutting out over-eager orchestration and focusing on validation work (https://sibylline.dev/articles/2026-01-27-stop-orchestrating...). There are a few sides to this covered in the article. You should always have high value task to switch to when the agent is working (instead of scrolling tiktok, instagram,X, youtube, facebook, hackernews .etc). In my case I might try start to read some books that I have on the backburner like Ghost in the Wires. You should disable agent notifications and take control of when you return to check the model context to be less ADHD ridden when programming with agents and actually make meaningful progress on the side task since you only context switch when you are satisfied. The final one is to always have at least one agent and preferably only one agent running in the background. The idea is that always having an agent results in a slow burn of productivity improvements and a process where you can slowly improve the background agent performance. Generally, always having some agent running is a good way to stay on top of what current model capabilities are.
I also really liked the idea of overnight agents for library research, redevelopment of projects to test out new skills, tests and AGENTS.md modifications.
noosphr|24 days ago
This is the honeymoon phase. You're learning the ins and outs of the specific model you're using and becoming more productive. It's magical. Nothing can stop you. Then you might not be improving as fast as you did at the start, but things are getting better every day. Or maybe every week. But it's heaps better than doing it by hand because you have so much mental capacity left.
Then a new release comes up. An arbitrary fraction of your hard earned intuition is not only useless but actively harmful to getting good results with the new models. Worse you will never know which part it is without unlearning everything you learned and starting over again.
I've had to learn the quirks of three generations of frontier families now. It's not worth the hassle. I've gone back to managing the context window in Emacs because I can't be bothered to learn how to deal with another model family that will be thrown out in six months. Copy and paste is the universal interface and being able to do surgery on the chat history is still better than whatever tooling is out there.
Unironically learning vim or Emacs and the standard Unix code tools is still the best thing you can do to level up your llm usage.
tudelo|24 days ago
> I've gone back to managing the context window in Emacs because I can't be bothered to learn how to deal with another model family that will be thrown out in six months.
Can you expand more on what you mean by that? I'm a bit of a noob on llm enabled dev work. Do you mean that you will kick off new sessions and provide a context that you manage yourself instead of relying on a longer running session to keep relevant information?
> Unironically learning vim or Emacs and the standard Unix code tools is still the best thing you can do to level up your llm usage.
I appreciate your insight but I'm failing to understand how exactly knowing these tools increases performance of llms. Is it because you can more precisely direct them via prompts?
sunshinekitty|24 days ago
OP is also a founder of Hashicorp, so.. lol.
> This is the honeymoon phase.
No offense but you come across as if you didn’t read the article.
unknown|24 days ago
[deleted]
tpoacher|24 days ago
This reminded me of back when wysiwyg web editors started becoming a thing, and coders started adding those "Created in notepad" stickers to their webpages, to point out they were 'real' web developers. Fun times.
raphinou|24 days ago
henry_bone|24 days ago
That said, I've given it a go. I used zed, which I think is a pretty great tool. I bought a pro subscription and used the built in agent with Claude Sonnet 4.x and Opus. I'm a Rails developer in my day job, and, like MitchellH and many others, found out fairly quickly that tasks for the LLM need to be quite specific and discrete. The agent is great a renames and minor refactors, but my preferred use of the agent was to get it to write RSpec tests once I'd written something like a controller or service object.
And generally, the LLM agent does a pretty great job of this.
But here's the rub: I found that I was losing the ability to write rspec.
I went to do it manually and found myself trying to remember API calls and approaches required to write some specs. The feeling of skill leaving me was quite sobering and marked my abandonment of LLMs and Zed, and my return to neovim, agent-free.
The thing is, this is a common experience generally. If you don't use it, you lose it. It applies to all things: fitness, language (natural or otherwise), skills of all kinds. Why should it not apply to thinking itself.
Now you may write me and my experience off as that of a lesser mind, and that you won't have such a problem. You've been doing it so long that it's "hard-wired in" by now. Perhaps.
It's in our nature to take the path of least resistance, to seek ease and convenience at every turn. We've certainly given away our privacy and anonymity so that we can pay for things with our phones and send email for "free".
LLMs are the ultimate convenience. A peer or slave mind that we can use to do our thinking and our work for us. Some believe that the LLM represents a local maxima, that the approach can't get much better. I dunno, but as AI improves, we will hand over more and more thinking and work to it. To do otherwise would be to go against our very nature and every other choice we've made so far.
But it's not for me. I'm no MitchellH, and I'm probably better off performing the mundane activities of my work, as well as the creative ones, so as to preserve my hard-won knowledge and skills.
YMMV
I'll leave off with the quote that resonates the most with me as I contemplate AI:-
"I say your civilization, because as soon as we started thinking for you, it really became our civilization, which is, of course, what this is all about." -- Agent Smith "The Matrix"
luisgvv|24 days ago
- When tests didn't work I had to check what was going on and the LLMs do cheat a lot with Volkswagen tests, so that began to make me skeptic even of what is being written by the agents
- When things were broken, spaghetti and awful code tends to be written in an obnoxius way it's beyond repairable and made me wish I had done it from scratch.
Thankfully I just tried using agents for tests and not for the actual code, but it makes me think a lot if "vibe coding" really produces quality work.
FeteCommuniste|24 days ago
cal_dent|24 days ago
butterNaN|24 days ago
This gave me a physical flinch. Perhaps this is unfounded, but all this makes me think of is this becoming the norm, millions of people doing this, and us cooking our planet out much faster than predicted.
erelong|24 days ago
agents jump ahead to the point of the user and project being out of control and more expensive
I think a lot of us still hesitate to make that jump; or at least I am not sure of a cost-effective agent approach (I guess I could manually review their output, but I could see it going off track quickly)
I guess I'd like to see more of an exact breakdown of what prompts and tools and AI are used to get ideas on if I'd use that for myself more
Havoc|24 days ago
Something with actual users needs a bit more care
tigerlily|24 days ago
Flowers for Algernon.
Or at least the first half. I don't wanna see what it looks like when AI capabilities start going in reverse.
But I want to know.
mwigdahl|24 days ago
butler14|24 days ago
energy123|24 days ago
That depends on your budget. To work within my pro plan's codex limits, I attach the codebase as a single file to various chat windows (GPT 5.2 Thinking - Heavy) and ask it to find bugs/plan a feature/etc. Then I copy the dense tasklist from chat to codex for implementation. This reduces the tokens that codex burns.
Also don't sleep on GPT 5.2 Pro. That model is a beast for planning.
amterp|23 days ago
I thought the "dont let agents finishing interrupt you" workflow was an interesting point. I've set up chime hooks to basically do the opposite, and this gives me pause to wonder if I'm underestimating the cost of context switching. I'll give it a go.
seemaze|24 days ago
The human-agent relationship described in the article made me wonder: are natural, or experienced, managers having more success with AI as subordinates than people without managerial skill? Are AI agents enormously different than arbitrary contractors half a world away where the only communication is daily text exchanges?
unknown|24 days ago
[deleted]
josh-sematic|24 days ago
fix4fun|24 days ago
You mentioned "harness engineering". How do you approach building "actual programmed tools" (like screenshot scripts) specifically for an LLM's consumption rather than a human's? Are there specific output formats or constraints you’ve found most effective?
tppts|24 days ago
apitman|24 days ago
apetresc|24 days ago
glitchcrab|24 days ago
That way the blast radius is vastly reduced.
awesan|24 days ago
rthak|24 days ago
"Please let us sit down and have a reasonable conversation! I was a skeptic, too, but if all skeptics did what I did, they would come to Jesus as well! Oh, and pay the monthly Anthropic tithe!"
kaffekaka|24 days ago
This I have found to be important too.
taikahessu|24 days ago
e40|24 days ago
swordsith|24 days ago
rldjbpin|24 days ago
not ashamed to say that i am between steps 2 and 3 in my personal workflow.
>Adopting a tool feels like work, and I do not want to put in the effort
all the different approaches floating online feel ephemeral to me. this, just like for different tools for the op, seem like a chore to adopt. i like the fomo mongering from the community does not help here, but in the end it is a matter of personal discovery to stick with what works for you.
bthornbury|24 days ago
simianparrot|24 days ago
Solution-looking-for-a-problem mentality is a curse.
unknown|24 days ago
[deleted]
dudewhocodes|24 days ago
vazma|23 days ago
0xbadcafebee|24 days ago
This is the main reason to use AI agents, though: multitasking. If I'm working on some Terraform changes and I fire off an agent loop, I know it's going to take a while for it to produce something working. In the meantime I'm waiting for it to come back and pretend it's finished (really I'll have to fix it), so I start another agent on something else. I flip back and forth between the finished runs as they notify me. At the end of the day I have 5 things finished rather than two.
The "agent" doesn't have to be anything special either. Anything you can run in a VM or container (vscode w/copilot chat, any cli tool, etc) so you can enable YOLO mode.
rhubarbtree|24 days ago
skrebbel|24 days ago
fergie|24 days ago
But yes, Hashimoto is a high profile CEO/CTO who may well have an indirect, or near-future interest in talking up AI. HN articles extoling the productivity gains of Claude on HN do generally tend to be from older, managerial types (make of that what you will).
simianwords|24 days ago
jvillasante|24 days ago
- What about non opensource work that's not on Github?
- Costs! I would think "an agent always running" would add up quickly
- In open source work, how does it amplify others. Are you seeing AI Slop as PRs? Can you tell the difference?
jon_north|24 days ago
I mean, what is the point of change if not to improve? I don't mean "I felt I was more efficient." Feelings aren't measurements. Numbers!
wackget|24 days ago
It makes me profoundly sad to think of the huge number of AI agents running endlessly to produce vibe-coded slop. The environmental impact must be massive.
jjice|24 days ago
Keep in mind that these are estimates, but you could attempt to extrapolate from here. Programming prompts probably take more because I assume the average context is a good bit higher than the average ChatGPT question, plus additional agents.
All in, I'm not sure if the energy usage long term is going to be overblown by media or if it'll be accurate. I'm personally not sure yet.
bullshitsite|24 days ago
OPINION | THE REALITY CHECK In the gleaming offices of Silicon Valley and the boardrooms of the Fortune 500, a new religion has taken hold. Its deity is the Large Language Model, and its disciples—the AI Evangelists—speak in a dialect of "disruption," "optimization," and "seamless integration." But outside the vacuum of the digital world, a dangerous friction is building between AI’s statistical hallucinations and the unyielding laws of physics.
The danger of Artificial Intelligence isn't that it will become our overlord; the danger is that it is fundamentally, confidently, and authoritatively stupid.
The Paradox of the Wind-Powered Car The divide between AI hype and reality is best illustrated by a recent technical "solution" suggested by a popular AI model: an electric vehicle equipped with wind generators on the front to recharge the battery while driving. To the AI, this was a brilliant synergy. It even claimed the added weight and wind resistance amounted to "zero."
To any human who has ever held a wrench or understood the First Law of Thermodynamics, this is a joke—a perpetual motion fallacy that ignores the reality of drag and energy loss. But to the AI, it was just a series of words that sounded "correct" based on patterns. The machine doesn't know what wind is; it only knows how to predict the next syllable.
The Erosion of the "Human Spark" The true threat lies in what we are sacrificing to adopt this "shortcut" culture. There is a specific human process—call it The Stare. It is that thirty-minute window where a person looks at a broken machine, a flawed blueprint, or a complex problem and simply observes.
In that half-hour, the human brain runs millions of mental simulations. It feels the tension of the metal, the heat of the circuit, and the logic of the physical universe. It is a "Black Box" of consciousness that develops solutions from absolutely nothing—no forums, no books, and no Google.
However, the new generation of AI-dependent thinkers views this "Stare" as an inefficiency. By outsourcing our thinking to models that cannot feel the consequences of being wrong, we are witnessing a form of evolutionary regression. We are trading hard-earned competence for a "Yes-Man" in a box.
The Gaslighting of the Realist Perhaps most chilling is the social cost. Those who still rely on their intuition and physical experience are increasingly being marginalized. In a world where the screen is king, the person pointing out that "the Emperor has no clothes" is labeled as erratic, uneducated, or naive.
When a master craftsman or a practical thinker challenges an AI’s "hallucination," they aren't met with logic; they are met with a robotic refusal to acknowledge reality. The "AI Evangelists" have begun to walk, talk, and act like the models they worship—confidently wrong, devoid of nuance, and completely detached from the ground beneath their feet.
The High Cost of Being "Authoritatively Wrong" We are building a world on a foundation of digital sand. If we continue to trust AI to design our structures and manage our logic, we will eventually hit a wall that no "prompt" can fix.
The human brain runs on 20 watts and can solve a problem by looking at it. The AI runs on megawatts and can’t understand why a wind-powered car won't run forever. If we lose the ability to tell the difference, we aren't just losing our jobs—we're losing our grip on reality itself.
MORPHOICES|24 days ago
[deleted]
whatifnomoney|24 days ago
[deleted]
vonneumannstan|24 days ago
SpicyLemonZest|24 days ago
jorvi|24 days ago
recursive|24 days ago
z0r|24 days ago
dakiol|24 days ago
Tons of respect for Mitchell. I think you are doing him a disservice with these kinds of comments.
unknown|24 days ago
[deleted]
mold_aid|24 days ago
silisili|24 days ago
If by company success, then Zuckerberg and Musk are better than all of us.
If by millions made, as he likes to joke/brag about... Fabrice Bellard is an utter failure.
If by install base, the geniuses that made MS Teams are among the best.
None of this is to take away from the successes of the man, but this kind of statement is rather silly.
jeffrallen|24 days ago
LOL, been there, done that. It is much less frustrating and demoralizing than babysitting your kind of stupid colleague though. (Thankfully, I don't have any of those anymore. But at previous big companies? Oh man, if only their commits were ONLY as bad as a bad AI commit.)
xyst|24 days ago
dang|24 days ago
"Don't be snarky."
"Don't be curmudgeonly. Thoughtful criticism is fine, but please don't be rigidly or generically negative."
https://news.ycombinator.com/newsguidelines.html
therein|24 days ago
dang|24 days ago
alterom|24 days ago
Which is why I like this article. It's realistic in terms of describing the value-propositio of LLM-based coding assist tools (aka, AI agents).
The fact that it's underwhelming compared to the hype we see every day is a very, very good sign that it's practical.
stronglikedan|24 days ago
polyrand|24 days ago
I think this is something people ignore, and is significant. The only way to get good at coding with LLMs is actually trying to do it. Even if it's inefficient or slower at first. It's just another skill to develop [0].
And it's not really about using all the plugins and features available. In fact, many plugins and features are counter-productive. Just learn how to prompt and steer the LLM better.
[0]: https://ricardoanderegg.com/posts/getting-better-coding-llms...