I have been a developer for twenty years now. For me to trust code, my want is to understand every single line. I learned long ago working on projects with a team that that becomes impossible for a single person on large projects. I learned to trust that someone understands the code and between blames and Slack I can almost always hunt that person down.
More and more often, while doing code review, I find I will not understand something and I will ask, and the "author" will clearly have no idea what it is doing either.
I find it quite troubling how little actual human thought is going into things. The AIs context window is not nearly large enough to fully understand the entire scope of any decently sized applications ecosystem. It just takes small peaks at bits and makes decisions based on a tiny slice of the world.
It's a powerful tool and as such needs to be guided with care.
I have seen so many projects were people who understood all of it, are just gone. They moved, did something else etc.
As soon as this happens, you no longer have anyone 'getting it'. You have to handle so many people adding/changing very thin lines across all components and you can only hope that the original people had enough foresight adding enough unit tests for core decisions.
We might have to give up on trust and understanding in complex domains. To draw an analogy from another field, pharmaceutical researchers often don't understand the exact mechanism of action for drugs they develop. Biological systems are too complex and much of the basic research hasn't been done yet. So they rely on rigorous testing to verify that new drugs are safe and effective. It isn't a perfect system — sometimes drugs get recalled or have warnings added later — but works well enough.
Can humans though? There's a reason we don't just lump everything into one giant file and singleton class named DoIt(). Who hasn't come back around to some bit of code in a project and wondered what dumbass wrote this, only for the logs to tell you that it was you that wrote it, years ago. If AI is resulting in code that's more modular, in smaller digestible and understandable chunks, I'm not hearing that as a bad thing!
It's nice to see a wide array of discussions under this! Glad that I didn't give up on this thought and end up writing it down.
I want to stress that the main point of my article is not really about AI coding, it's about letting AI perform any arbitrary tasks reliably. Coding is an interesting one because it seems like it's a place where we can exploit structure and abstraction and approaches (like TDD) to make verification simpler - it's like spot-checking in places with a very low soundness error.
I'm encouraging people to look for tasks other than coding to see if we can find similar patterns. The more we can find these cost asymmetry (easier to verify than doing), the more we can harness AI's real potential.
Note that in the case of coding, there is an entire branch of computer science dedicated to verification.
All the type systems (and model-checkers) for Rust, Ada, OCaml, Haskell, TypeScript, Python, C#, Java, ... are based on such research, and these are all rather weak in comparison to what research has created in the last ~30 years (see Rocq, Idris, Lean).
This goes beyond that, as some of these mechanisms have been applied to mathematics, but also to some aspects of finance and law (I know of at least mechanisms to prove formally implementations of banking contracts and tax management).
So there is lots to do in the domain. Sadly, as every branch of CS other than AI (and in fact pretty much every branch of science other than AI), this branch of computer science is underfunded. But that can change!
All these engineers who claim to write most code through AI - I wonder what kind of codebase that is. I keep on trying, but it always ends up producing superficially okay-looking code, but getting nuances wrong. Also fails to fix them (just changes random stuff) if pointed to said nuances.
I work on a large product with two decades of accumulated legacy, maybe that's the problem. I can see though how generating and editing a simple greenfield web frontend project could work much better, as long as actual complexity is low.
I have my best successes by keeping things constrained to method-level generation. Most of the things I dump into ChatGPT look like this:
public static double ScoreItem(Span<byte> candidate, Span<byte> target)
{
//TODO: Return the normalized Levenshtein distance between the 2 byte sequences.
//... any additional edge cases here ...
}
I think generating more than one method at a time is playing with fire. Individual methods can be generated by the LLM and tested in isolation. You can incrementally build up and trust your understanding of the problem space by going a little bit slower. If the LLM is operating over a whole set of methods at once, it is like starting over each time you have to iterate.
It's architecture dependent. A fairly functional modular monolith with good documentation can be accessible to LLMs at the million line scale, but a coupled monolith or poorly instrumented microservices can drive agents into the ground at 100k.
I think your intuition matches mine. When I try to apply Claude Code to a large code base, it spends a long time looking through the code and then it suggests something incorrect or unhelpful. It's rarely worth the trouble.
When I give AI a smaller or more focused project, it's magical. I've been using Claude Code to write code for ESP32 projects and it's really impressive. OTOH, it failed to tell me about a standard device driver I could be using instead of a community device driver I found. I think any human who works on ESP-IDF projects would have pointed that out.
I've tried it extensively, and have the same experience as you. AI is also incredibly stubborn when it wants to go down a path I reject. It constantly tries to do it anyway and will slip things in.
I've tried vibe coding and usually end up with something subtly or horribly broken, with excessive levels of complexity. Once it digs itself a hole, it's very difficult to extricate it even with explicit instruction.
* My 5 years old project: monorepo with backend, 2 front-ends and 2 libraries
* 10+ years old company project: about 20 various packages in monorepo
In both cases I successfully give Claude Code or OpenCode instructions either at package level or monorepo level. Usually I prefer package level.
E.g. just now I gave instructions in my personal project: "Invoice styles in /app/settings/invoice should be localized". It figured out that unlocalized strings comes from library package, added strings to the code and messages files (added missing translations), however has not cleaned up hardcoded strings from library. As I know code I have written extra prompt "Maybe INVOICE_STYLE_CONFIGS can be cleaned-up in such case" and it cleaned-up what I have expected, ran tests and linting.
I've generally had better luck when using it on new projects/repos. When working on a large existing repo it's very important to give it good context/links/pointers to how things currently work/how they should work in that repo.
Also - claude (~the best coding agent currently imo) will make mistakes, sometimes many of them - tell it to test the code it writes and make sure it's working - I've generally found its pretty good at debugging/testing and fixing it's own mistakes.
So far I found that AI is very good at writing the code as in translating english to computer code.
Instead of dealing with intricacies of directly writing the code, I explain the AI what are we trying to achieve next and what approach I prefer. This way I am still on top of it, I am able to understand the quality of the code it generated and I’m the one who integrates everything.
So far I found the tools that are supposed to be able to edit the whole codebase at once be useless. I instantly loose perspective when the AI IDE fiddles with multiple code blocks and does some magic. The chatbot interface is superior for me as the control stays with me and I still follow the code writing step by step.
> I work on a large product with two decades of accumulated legacy, maybe that's the problem.
I'm in a similar situation, and for the first time ever I'm actually considering if a rewrite to microservices would make sense, with a microservice being something small enough an AI could actually deal with - and maybe even build largely on its own.
Yes, unfortunately those who jumped on the microservices hype train over the past 15 years or so are now getting the benefits of Claude Code, since their entire codebases fits into the context window of Sonnet/Opus and can be "understood" by the LLM to generate useful code.
This is not the case for most monoliths, unless they are structured into LLM-friendly components that resemble patterns the models have seen millions of times in their training data, such as React components.
Can you prove it in a blog and post it here that you do better code snippets than AI. If you claim "what kind of codebase", you should be able to use some codebase from github to prove it?
The proliferation of nondeterministically generated code is here to stay. Part of our response must be more dynamic, more comprehensive and more realistic workload simulation and testing frameworks.
I disagree. I think we're testing it, and we haven't seen the worst of it yet.
And I think it's less about non-deterministic code (the code is actually still deterministic) but more about this new-fangled tool out there that finally allows non-coders to generate something that looks like it works. And in many cases it does.
Like a movie set. Viewed from the right angle it looks just right. Peek behind the curtain and it's all wood, thinly painted, and it's usually easier to rebuild from scratch than to add a layer on top.
Code has always been nondetermistic. Which engineer wrote it? What was their past experience? This just feels like we are accepting subpar quality because we have no good way to ensure the code we generate is reasonable that wont mayyyybe rm-rf our server as a fun easter egg.
> A very good example of the first category is image (and video) generation. Drawing/rendering a realistic looking image is a crazily hard task. Have you tried to make a slide look nicer? It will take me literally hours to center the text boxes to make it look “good”. However, you really just need to take a look at the output of Nano Banana and you can tell if it’s a good render or a bad one based on how you feel.
The writer could be very accomplished when it comes to developing - I don’t know - but they clearly don’t understand a single thing about visual arts or culture. I probably could center those text boxes after fiddling with them maybe ten seconds - I have studied art since I was a kid. My bf could do it instantly without thinking a second, he is a graphic designer. You might think that you are able to see what « looks good » since, hey you have eyes, but no you can’t. There’s million details you will miss, or maybe feel something is off, but cannot quite say why. This is why you have graphic designers, who are trained to do that to do it. They can also use generative tools to make something genuinely stunning, unlike most of us. Why? Skills.
This is the same difference why the guy in the story who can’t code can’t code even with LLM, whereas the guy who cans is able to code even faster with these new tools. If use LLM’s for basically auto-completion (what transformer models really are for) you can work with familiar codebase very quickly I’m sure. I’ve used it to gen SQL call statements, which I can’t be bothered to type myself and it was perfect. If I try to generate something I don’t really understand or know how to do, I’m lost staring at sole horrible gobbledygoo that is never going to work. Why? Skills.
There is no verification engineering. There is just people who know how to do things, who have studied their whole life to get those skills. And no, you will not replace a real hardcore professional with an LLM. LLM’s are just tools, nothing else. A tractor replaced a horse in turning the field, bit you still need a farmer to drive it.
> You might think that you are able to see what « looks good » since, hey you have eyes, but no you can’t.
I'm sure lots of people will reply to you stating the opposite, but for what it's worth, I agree. I am not a visual artist... well, not any more, I was really into it as a kid and had it beaten out of me by terrible art teachers, but I digress... I am creative (music), and have a semblance of understanding of the creative process.
I ran a SaaS company for 20 years and would be constantly amazed at how bad the choices of software engineers would be when it came to visual design. I could never quite understand whether they just didn't care or just couldn't see. I always believed (hoped) it was the latter. Even when I explained basic concepts like consistent borders, grid systems, consistent fonts and font-sizing, less visual clutter, etc. they would still make the same mistakes over and over.
To the trained eye they immediately see it and see what's right and what's wrong. And that's why we still need experts. It doesn't matter what is being generated, if you don't have expertise to know whether it's good or not, the chances are glaring errors will be missed (in code and in visual design)
> A tractor replaced a horse in turning the field, bit you still need a farmer to drive it.
Before mechanisation, like 50x more people worked in the agricultural sector, compared to today. So tractors certainly left without work a huge number of people. Our society adapted to this change and sucked these people into industrial sector.
If LLM would work like a tractor, it would force 49 out of 50 programmers (or, more generically, blue-collar workers) to left their industry. Is there a place for them to work instead? I don't know.
I have learned a little bit of photoshop and 10 years ago maya too.
But i'm a software engineere by trade and I do not struggle with telling you that this thing has to move left for reason xy, i would struggle with random tools capable of doing that particular thing for me.
And it does not matter here how i did it if the result is the same result.
In Software Engineering this is just not always the case. Because often enough you would need to verify that what you get is the thing you expect (did the report actually take the right numbers) or Security. Security is the biggest risk to all ai coding out there. Security is already so hard because people don't see it, they ignore it because they don't know.
You have so many non functional requirements in software which just don't exist in art. If i need that image, thats it. Most complex thing here? Perhaps color calibration and color profiles. Resolution.
If we talk about 3D it gets again a little bit more complicated because now we talk the right 3d model, right way to rig, etc.
Also if someone says "i need a picture for x" and is happy about it, the risk is less customers. But if someone needs a new feature and tomorrow all your customer data are exposed or the companies product stops working because of a basic bug, the company might be gone a week later.
> “AI always thinks and learns faster than us, this is undeniable now”
No, it neither thinks nor learns. It can give an illusion of thinking, and an AI model itself learns nothing. Instead it can produce a result based on its training data and context.
I think it important that we do not ascribe human characteristics where not warranted. I also believe that understanding this can help us better utilize AI.
Verification is key, and the issue is that almost all AI generated code looks plausible so just reading the code is usually not enough. You need to build extremely good testing systems and actually run through the scenarios that you want to ensure work to be confident in the results.
This can be preview deployments or other AI generated end to end tests that produce video output that you can watch or just a very good test suite with guard rails.
Without such automation and guard rails, AI generated code eventually becomes a burden on your team because you simply can't manually verify every scenario.
"AI always thinks and learns faster than us, this is undeniable now. "
Sort of a nitpick, because what's written is true in some contexts (I get it, web development is like the ideal context for AI for a variety of reasons), but this is currently totally false in lots of knowledge domains very much like programming. AI is currently terrible at the math niches I'm interested in. Since there's no economic incentive to improve things and no mountain of literature on those topics, unless AI really becomes self-learning / improves in some real way, I don't see the situation ever changing. AI has consistently gotten effectively a 0% score on my personal benchmarks for those topics.
It's just aggravating to see someone write "totally undeniable" when the thing is trivially denied.
It's like a buffered queue, if the producer (AI) is too fast for the consumer (dev's brain) then the producer needs to block/stop/slow down other wise data will be lost (in this analogy the data loss is the consumer no longer having a clear understanding of what the code is doing)
One day, when AI becomes reliable (which is still a while off because AI doesn't yet understand what it's doing) then the AI will replace the consumer (IMO).
FTR - AI is still at the "text matches another pattern of text" stage, and not the "understand what concepts are being conveyed" stage, as demonstrated by AI's failure to do basic arithmetic
This feeling of verification >> generation anxiety bears a resemblance to that moment when you're learning a foreign language, you speak a well-prepared sentence, and your correspondent says something back, of which you only understand about a third.
In like fashion, when I start thinking of a programming statement (as a bad/rookie programmer) and an assistant completes my train of thought (as is default behaviour in VS Code for example), I get that same feeling that I did not grasp half the stuff I should've, but nevertheless I hit Ctrl-Return because it looks about right to me.
this is something one can look in further. it is really probabilistic checkable proofs underneath, and we are naturally looking for places where it needs to look right, and use that as a basis of assuming the work is done right.
> Maybe our future is like the one depicted in Severance - we look at computer screens with wiggly numbers and whatever “feels right” is the right thing to do. We can harvest these effortless low latency “feelings” that nature gives us to make AI do more powerful work.
Come to think about it... aren't this exactly what syntax coloring and proper indentation are all about? The ability to quickly pattern-spot errors, or at least smells, based on nothing but aesthetics?
I'm sure that there is more research to be done in this direction.
I think there's a lot of utility to current AI tools, but it's also clear we're in a very unsettled phase of this technology. We likely won't see for years where the technology lands in terms of capability or the changes that will be made to society and industry to accommodate.
Somewhat unfortunately, the sheer amount of money being poured into AI means that it's being forced upon many of us, even if we didn't want it. Which results in a stark, vast gap like the author is describing, where things are moving so fast that it can feel like we may never have time to catch up.
And what's even worse, because of this industry and individuals are now trying to have the tool correct and moderate itself, which intuitively seems wrong from both a technical and societal standpoint.
I've been thinking about something like this from a UI perspective. I'm a UX designer working on a product with a fairly legacy codebase. We're vibe coding prototypes and moving towards making it easier for devs to bring in new components. We have a hard enough time verifying the UI quality as it is. And having more devs vibing on frontend code is probably going to make it a lot worse. I'm thinking about something like having agents regularly traversing the code to identify non-approved components (and either fixing or flagging them). Maybe with this we won't fall further behind with verification debt than we already are.
The verification asymmetry framing is good, but I think it undersells the organizational piece.
Daniel works because someone built the regime he operates in. Platform teams standardized the patterns and defined what "correct" looks like and built test infrastructure that makes spot-checking meaningful and and and .... that's not free.
Product teams are about to pour a lot more slop into your codebase. That's good! Shipping fast and messy is how products get built. But someone has to build the container that makes slop safe, and have levers to tighten things when context changes.
The hard part is you don't know ahead of time which slop will hurt you. Nobody cares if product teams use deprecated React patterns. Until you're doing a migration and those patterns are blocking 200 files. Then you care a lot.
You (or rather, platform teams) need a way to say "this matters now" and make it real. There's a lot of verification that's broadly true everywhere, but there's also a lot of company-scoped or even team-scoped definitions of "correct."
(Disclosure: we're working on this at tern.sh, with migrations as the forcing function. There's a lot of surprises in migrations, so we're starting there, but eventually, this notion of "organizational validation" is a big piece of what we're driving at.)
Make it do scrum with sprint planning, retrospectives and sprint demos. A then another AI as product owner and scrum master. Ideally this AI has only a vague idea of what the product needs to or the technology but still has decision power. That should really slow it down.
Prompt engineering: just basic articulation skills.
Context engineering: just basic organization skills.
Verification engineering: just basic quality assurance skills.
And so on...
---
"Eric" will never be able to fully use AI for development because he lacks knowledge about even the most basic aspects of the developer's job. He's a PM after all.
I understand that the idea of turning everyone into instant developers is super attractive. However, you can't cheat learning. If you give an edge to non-developers for development tasks, it means you will give an even sharper edge to actual developers.
This is true. I've been anti-ai but I started using it recently as an alternative to stack overflow (because google is shoving it down my mouth via search results). It's pretty effective. It does get things wrong from time to time, but then I just fix it up manually. I can't claim it's making me 100x more productive or anything like that. It's just a nice alternative to scrolling through SO answers and looking for the one with the green checkmark.
I still find it sad when people use it for prose though.
I'm starting to come to the realization that unless there is a bottom to the amount of work people want done, it doesn't really matter about AI or not, there just seems to be a never ending supply of work so yeah, not sure how AI would resolve this.
AI can really only be as good as the data it’s trained on. It’s good at images because it’s trained on billions of them. Lines of code, probably 100s of millions, but as you combine those codes into concepts, split by language, framework, formatting etc all you loose the numbers game. It can’t tell you how to make a good enterprise app because almost nobody knows how to make a good enterprise app, just ask Oracle… ba-da-bum!
It’s called TDD, ya write a bunch a little tests to make sure your code is doing what it needs to do and not what it’s not. In short, little blocks of easily verifiable code to verify your code.
But seriously, what is this article even? It feels like we are reinventing the wheel or maybe just humble AI hype?
> He would just spot-check the correctness of AI’s work and quickly spin up local deployments to verify it’s indeed working.
I'm not really sure how exactly he get the project done, but "spot-check" and "quickly spin up local deployments to verify" is somehow makes me somewhat unconformable.
For me, it's either unit-tests that hits at least 100% coverage, or when unit-test is inapplicable, a line-by-line letter-by-letter verification. Otherwise your "spot-check" means no shit to me.
> AI always thinks and learns faster than us, this is undeniable now.
Huh? The LLMs we're using today don't learn at all. I don't even mean that in a philosophical sense— I mean they come "pre-baked" with whatever "knowledge" they have, and that's it.
Ai now is becoming hard to keep up with. We gotta make sure to integrate in our daily lives to not fall behind.
I literally began to make it a source of income. Make sure to do the same.
Appealing, but this is coming from someone smart/thoughtful. No offence to 'rest of world', but I think that most people have felt this way for years. And realistically in a year, there won't be any people who can keep up.
im hoping this can introduce a framework to help people visualize the problem and figure out a way to close that gap. image generation is something every one can verify, but code generation is perhaps not. but if we can make verifying code as effortless as verifying images (not saying it's possible), then our productivity can enter the next level...
I directly asked gemini how to get world peace. It said the world should prioritize addressing climate change, inequality, and discrimination. Yeah - we're not gonna do any of that shit. So I don't know what the point of "superintelligent" AI is if we aren't going to even listen to it for the basic big picture stuff. Any sort of "utopia" that people imagine AI bringing is doomed to fail because we already can't cooperate without AI
> I don't know what the point of "super intelligent" AI is if we aren't going to even listen to it
Because you asked the wrong question. The most likely question would be "How do I make a quadrillion dollars and humiliate my super rich peers?".
But realistically, it gave you an answer according to its capacity. A real super intelligent AI, and I mean oh-god-we-are-but-insects-in-its-shadow super intelligence, would give you a roadmap and blueprint, and it would take account for our deep-rooted human flaws, so no one reading it seriously could dismiss it as superficial. in fact, anyone world elite reading it would see it as a chance to humiliate their world elite peers and get all the glory for themselves.
You know how adults can fool little children to do what they don't want to? We would be the toddlers in that scenario. I hope this hypothetical AI has humans in high regard, because that would be the only thing saving us from ourselves.
Did you expect some answer that decried world peace as impossible ? It's just repeating what people say [0] when asked the same question. That's all that a large language model can do (other than putting it to rhyme or 'in the style of Charles Dickens').
I don't believe that this is going to happen, but the primary arguments revolving around a "super intelligent" ai involve removing the need for us to listen to it.
A super intelligent ai would have agency, and when incentives are not aligned would be adversarial.
In the caricature scenario, we'd ask, "super ai, how to achieve world peace?" It would answer the same way, but then solve it in a non-human centric approach: reducing humanities autonomy over the world.
Fixed: anthropogenic climate change resolved, inequality and discrimination reduced (by reducing population by 90%, and putting the rest in virtual reality)
> So I don't know what the point of "superintelligent" AI is if we aren't going to even listen to it
I would kind of feel sorry for a super-intelligent AI having to deal with humans who have their fingers on on/off switch. It would be a very frustrating existence.
> Any sort of "utopia" that people imagine AI bringing is doomed to fail because we already can't cooperate without AI
It's just fanfiction. They're just making up stories in their heads based on blending sci-fi they've read or watched in the past. There's no theory of power, there's no understanding of history or even the present, it's just a bad Star Trek episode.
"Intelligence" itself isn't even a precise concept. The idea that a "superintelligent" AI is intrinsically going to be obsessed with juvenile power fantasies is just silly. An AI doesn't want to enslave the world, run dictatorial experiments born of childhood frustrations and get all the girls. It doesn't want anything. It's purposeless. Its intelligence won't even be recognized as intelligence if its suggestions aren't pleasing to the powerful. They'll keep tweaking it to keep it precisely as dumb as they themselves are.
donatj|2 months ago
More and more often, while doing code review, I find I will not understand something and I will ask, and the "author" will clearly have no idea what it is doing either.
I find it quite troubling how little actual human thought is going into things. The AIs context window is not nearly large enough to fully understand the entire scope of any decently sized applications ecosystem. It just takes small peaks at bits and makes decisions based on a tiny slice of the world.
It's a powerful tool and as such needs to be guided with care.
MLgulabio|2 months ago
I have seen so many projects were people who understood all of it, are just gone. They moved, did something else etc.
As soon as this happens, you no longer have anyone 'getting it'. You have to handle so many people adding/changing very thin lines across all components and you can only hope that the original people had enough foresight adding enough unit tests for core decisions.
So i really don't mind AI here anymore.
DANmode|2 months ago
“Whatever code you commit - you own it - no matter who (or what) wrote it.”
Make this your top-down directive, and fire people who insist on throwing trash over the fence into your yard.
nradov|2 months ago
fragmede|2 months ago
BeFlatXIII|2 months ago
Does your company not have many retirements, firings, or employees who quit to work elsewhere?
yuedongze|2 months ago
I want to stress that the main point of my article is not really about AI coding, it's about letting AI perform any arbitrary tasks reliably. Coding is an interesting one because it seems like it's a place where we can exploit structure and abstraction and approaches (like TDD) to make verification simpler - it's like spot-checking in places with a very low soundness error.
I'm encouraging people to look for tasks other than coding to see if we can find similar patterns. The more we can find these cost asymmetry (easier to verify than doing), the more we can harness AI's real potential.
felipeerias|2 months ago
One that works particularly well in my case is test-driven development followed by pair programming:
• “given this spec/context/goal/… make test XYZ pass”
• “now that we have a draft solution, is it in the right component? is it efficient? well documented? any corner cases?…”
Yoric|2 months ago
All the type systems (and model-checkers) for Rust, Ada, OCaml, Haskell, TypeScript, Python, C#, Java, ... are based on such research, and these are all rather weak in comparison to what research has created in the last ~30 years (see Rocq, Idris, Lean).
This goes beyond that, as some of these mechanisms have been applied to mathematics, but also to some aspects of finance and law (I know of at least mechanisms to prove formally implementations of banking contracts and tax management).
So there is lots to do in the domain. Sadly, as every branch of CS other than AI (and in fact pretty much every branch of science other than AI), this branch of computer science is underfunded. But that can change!
blauditore|2 months ago
I work on a large product with two decades of accumulated legacy, maybe that's the problem. I can see though how generating and editing a simple greenfield web frontend project could work much better, as long as actual complexity is low.
bob1029|2 months ago
CuriouslyC|2 months ago
hathawsh|2 months ago
When I give AI a smaller or more focused project, it's magical. I've been using Claude Code to write code for ESP32 projects and it's really impressive. OTOH, it failed to tell me about a standard device driver I could be using instead of a community device driver I found. I think any human who works on ESP-IDF projects would have pointed that out.
AI's failings are always a little weird.
moomoo11|2 months ago
Now I use agentic coding a lot with maybe 80-90% success rate.
I’m on greenfield projects (my startup) and maintaining strict Md files with architecture decisions and examples helps a lot.
I barely write code anymore, and mostly code review and maintain the documentation.
In existing codebases pre-ai I think it’s near impossible because I’ve never worked anywhere that maintained documentation. It was always a chore.
qudat|2 months ago
Another good use case is to use it for knowledge searching within a codebase. I find that to be incredibly useful without much context "engineering"
freedomben|2 months ago
I've tried vibe coding and usually end up with something subtly or horribly broken, with excessive levels of complexity. Once it digs itself a hole, it's very difficult to extricate it even with explicit instruction.
daliusd|2 months ago
* My 5 years old project: monorepo with backend, 2 front-ends and 2 libraries
* 10+ years old company project: about 20 various packages in monorepo
In both cases I successfully give Claude Code or OpenCode instructions either at package level or monorepo level. Usually I prefer package level.
E.g. just now I gave instructions in my personal project: "Invoice styles in /app/settings/invoice should be localized". It figured out that unlocalized strings comes from library package, added strings to the code and messages files (added missing translations), however has not cleaned up hardcoded strings from library. As I know code I have written extra prompt "Maybe INVOICE_STYLE_CONFIGS can be cleaned-up in such case" and it cleaned-up what I have expected, ran tests and linting.
wubrr|2 months ago
Also - claude (~the best coding agent currently imo) will make mistakes, sometimes many of them - tell it to test the code it writes and make sure it's working - I've generally found its pretty good at debugging/testing and fixing it's own mistakes.
mrtksn|2 months ago
Instead of dealing with intricacies of directly writing the code, I explain the AI what are we trying to achieve next and what approach I prefer. This way I am still on top of it, I am able to understand the quality of the code it generated and I’m the one who integrates everything.
So far I found the tools that are supposed to be able to edit the whole codebase at once be useless. I instantly loose perspective when the AI IDE fiddles with multiple code blocks and does some magic. The chatbot interface is superior for me as the control stays with me and I still follow the code writing step by step.
bojan|2 months ago
I'm in a similar situation, and for the first time ever I'm actually considering if a rewrite to microservices would make sense, with a microservice being something small enough an AI could actually deal with - and maybe even build largely on its own.
themafia|2 months ago
You can start there. Does it ever stay that way?
> I work on a large product with two decades of accumulated legacy
Survey says: No.
silisili|2 months ago
Definitely. I've found Claude at least isn't so good at working in large existing projects, but great at greenfielding.
Most of my use these days is having it write specific functions and tests for them, which in fairness, saves me a ton of time.
rprend|2 months ago
That’s the typical “claude code writes all my code” setup. That’s my setup.
This does require you to fit your problem to the solution. But when you do, the results are tremendous.
tuhgdetzhh|2 months ago
This is not the case for most monoliths, unless they are structured into LLM-friendly components that resemble patterns the models have seen millions of times in their training data, such as React components.
cogman10|2 months ago
junkaccount|2 months ago
seanmcdirmid|2 months ago
zerosizedweasle|2 months ago
gradus_ad|2 months ago
OptionOfT|2 months ago
And I think it's less about non-deterministic code (the code is actually still deterministic) but more about this new-fangled tool out there that finally allows non-coders to generate something that looks like it works. And in many cases it does.
Like a movie set. Viewed from the right angle it looks just right. Peek behind the curtain and it's all wood, thinly painted, and it's usually easier to rebuild from scratch than to add a layer on top.
wasmainiac|2 months ago
glitchc|2 months ago
yuedongze|2 months ago
energy123|2 months ago
delis-thumbs-7e|2 months ago
The writer could be very accomplished when it comes to developing - I don’t know - but they clearly don’t understand a single thing about visual arts or culture. I probably could center those text boxes after fiddling with them maybe ten seconds - I have studied art since I was a kid. My bf could do it instantly without thinking a second, he is a graphic designer. You might think that you are able to see what « looks good » since, hey you have eyes, but no you can’t. There’s million details you will miss, or maybe feel something is off, but cannot quite say why. This is why you have graphic designers, who are trained to do that to do it. They can also use generative tools to make something genuinely stunning, unlike most of us. Why? Skills.
This is the same difference why the guy in the story who can’t code can’t code even with LLM, whereas the guy who cans is able to code even faster with these new tools. If use LLM’s for basically auto-completion (what transformer models really are for) you can work with familiar codebase very quickly I’m sure. I’ve used it to gen SQL call statements, which I can’t be bothered to type myself and it was perfect. If I try to generate something I don’t really understand or know how to do, I’m lost staring at sole horrible gobbledygoo that is never going to work. Why? Skills.
There is no verification engineering. There is just people who know how to do things, who have studied their whole life to get those skills. And no, you will not replace a real hardcore professional with an LLM. LLM’s are just tools, nothing else. A tractor replaced a horse in turning the field, bit you still need a farmer to drive it.
louthy|2 months ago
I'm sure lots of people will reply to you stating the opposite, but for what it's worth, I agree. I am not a visual artist... well, not any more, I was really into it as a kid and had it beaten out of me by terrible art teachers, but I digress... I am creative (music), and have a semblance of understanding of the creative process.
I ran a SaaS company for 20 years and would be constantly amazed at how bad the choices of software engineers would be when it came to visual design. I could never quite understand whether they just didn't care or just couldn't see. I always believed (hoped) it was the latter. Even when I explained basic concepts like consistent borders, grid systems, consistent fonts and font-sizing, less visual clutter, etc. they would still make the same mistakes over and over.
To the trained eye they immediately see it and see what's right and what's wrong. And that's why we still need experts. It doesn't matter what is being generated, if you don't have expertise to know whether it's good or not, the chances are glaring errors will be missed (in code and in visual design)
vbezhenar|2 months ago
Before mechanisation, like 50x more people worked in the agricultural sector, compared to today. So tractors certainly left without work a huge number of people. Our society adapted to this change and sucked these people into industrial sector.
If LLM would work like a tractor, it would force 49 out of 50 programmers (or, more generically, blue-collar workers) to left their industry. Is there a place for them to work instead? I don't know.
MLgulabio|2 months ago
But i'm a software engineere by trade and I do not struggle with telling you that this thing has to move left for reason xy, i would struggle with random tools capable of doing that particular thing for me.
And it does not matter here how i did it if the result is the same result.
In Software Engineering this is just not always the case. Because often enough you would need to verify that what you get is the thing you expect (did the report actually take the right numbers) or Security. Security is the biggest risk to all ai coding out there. Security is already so hard because people don't see it, they ignore it because they don't know.
You have so many non functional requirements in software which just don't exist in art. If i need that image, thats it. Most complex thing here? Perhaps color calibration and color profiles. Resolution.
If we talk about 3D it gets again a little bit more complicated because now we talk the right 3d model, right way to rig, etc.
Also if someone says "i need a picture for x" and is happy about it, the risk is less customers. But if someone needs a new feature and tomorrow all your customer data are exposed or the companies product stops working because of a basic bug, the company might be gone a week later.
jstanley|2 months ago
For example, Inkscape has this and it is easy to use.
HWR_14|2 months ago
nradov|2 months ago
https://www.deere.com/en/autonomous/
aryehof|2 months ago
No, it neither thinks nor learns. It can give an illusion of thinking, and an AI model itself learns nothing. Instead it can produce a result based on its training data and context.
I think it important that we do not ascribe human characteristics where not warranted. I also believe that understanding this can help us better utilize AI.
jascha_eng|2 months ago
Without such automation and guard rails, AI generated code eventually becomes a burden on your team because you simply can't manually verify every scenario.
yuedongze|2 months ago
bigbuppo|2 months ago
jopsen|2 months ago
And I have on occasion found it useful.
catigula|2 months ago
If you can make as a rule "no AI for tests", then you can simply make the rule "no AI" or just learn to cope with it.
adxl|2 months ago
WhyOhWhyQ|2 months ago
Sort of a nitpick, because what's written is true in some contexts (I get it, web development is like the ideal context for AI for a variety of reasons), but this is currently totally false in lots of knowledge domains very much like programming. AI is currently terrible at the math niches I'm interested in. Since there's no economic incentive to improve things and no mountain of literature on those topics, unless AI really becomes self-learning / improves in some real way, I don't see the situation ever changing. AI has consistently gotten effectively a 0% score on my personal benchmarks for those topics.
It's just aggravating to see someone write "totally undeniable" when the thing is trivially denied.
jakeydus|2 months ago
You've described AI hype bros in a nutshell, I think.
awesome_dude|2 months ago
One day, when AI becomes reliable (which is still a while off because AI doesn't yet understand what it's doing) then the AI will replace the consumer (IMO).
FTR - AI is still at the "text matches another pattern of text" stage, and not the "understand what concepts are being conveyed" stage, as demonstrated by AI's failure to do basic arithmetic
kristjank|2 months ago
In like fashion, when I start thinking of a programming statement (as a bad/rookie programmer) and an assistant completes my train of thought (as is default behaviour in VS Code for example), I get that same feeling that I did not grasp half the stuff I should've, but nevertheless I hit Ctrl-Return because it looks about right to me.
yuedongze|2 months ago
this is something one can look in further. it is really probabilistic checkable proofs underneath, and we are naturally looking for places where it needs to look right, and use that as a basis of assuming the work is done right.
Yoric|2 months ago
Come to think about it... aren't this exactly what syntax coloring and proper indentation are all about? The ability to quickly pattern-spot errors, or at least smells, based on nothing but aesthetics?
I'm sure that there is more research to be done in this direction.
bitwize|2 months ago
yannyu|2 months ago
Somewhat unfortunately, the sheer amount of money being poured into AI means that it's being forced upon many of us, even if we didn't want it. Which results in a stark, vast gap like the author is describing, where things are moving so fast that it can feel like we may never have time to catch up.
And what's even worse, because of this industry and individuals are now trying to have the tool correct and moderate itself, which intuitively seems wrong from both a technical and societal standpoint.
pglevy|2 months ago
cousinbryce|2 months ago
huflungdung|2 months ago
[deleted]
unknown|2 months ago
[deleted]
trjordan|2 months ago
Daniel works because someone built the regime he operates in. Platform teams standardized the patterns and defined what "correct" looks like and built test infrastructure that makes spot-checking meaningful and and and .... that's not free.
Product teams are about to pour a lot more slop into your codebase. That's good! Shipping fast and messy is how products get built. But someone has to build the container that makes slop safe, and have levers to tighten things when context changes.
The hard part is you don't know ahead of time which slop will hurt you. Nobody cares if product teams use deprecated React patterns. Until you're doing a migration and those patterns are blocking 200 files. Then you care a lot.
You (or rather, platform teams) need a way to say "this matters now" and make it real. There's a lot of verification that's broadly true everywhere, but there's also a lot of company-scoped or even team-scoped definitions of "correct."
(Disclosure: we're working on this at tern.sh, with migrations as the forcing function. There's a lot of surprises in migrations, so we're starting there, but eventually, this notion of "organizational validation" is a big piece of what we're driving at.)
joeyguerra|2 months ago
vjvjvjvjghv|2 months ago
gaigalas|2 months ago
Context engineering: just basic organization skills.
Verification engineering: just basic quality assurance skills.
And so on...
---
"Eric" will never be able to fully use AI for development because he lacks knowledge about even the most basic aspects of the developer's job. He's a PM after all.
I understand that the idea of turning everyone into instant developers is super attractive. However, you can't cheat learning. If you give an edge to non-developers for development tasks, it means you will give an even sharper edge to actual developers.
booleandilemma|2 months ago
I still find it sad when people use it for prose though.
bamboozled|2 months ago
officerk|2 months ago
unknown|2 months ago
[deleted]
diddid|2 months ago
geldedus|2 months ago
karlkloss|2 months ago
ambicapter|2 months ago
wasmainiac|2 months ago
But seriously, what is this article even? It feels like we are reinventing the wheel or maybe just humble AI hype?
darylteo|2 months ago
nirui|2 months ago
I'm not really sure how exactly he get the project done, but "spot-check" and "quickly spin up local deployments to verify" is somehow makes me somewhat unconformable.
For me, it's either unit-tests that hits at least 100% coverage, or when unit-test is inapplicable, a line-by-line letter-by-letter verification. Otherwise your "spot-check" means no shit to me.
pxc|2 months ago
Huh? The LLMs we're using today don't learn at all. I don't even mean that in a philosophical sense— I mean they come "pre-baked" with whatever "knowledge" they have, and that's it.
Sabr0|2 months ago
CGMthrowaway|2 months ago
Good principle. This is exactly why we research vaccines and bioweapons side by side in the labs, for example.
rogerkirkness|2 months ago
dontlikeyoueith|2 months ago
I've heard the same claim every year since GPT-3.
It's still just as irrational as it was then.
airstrike|2 months ago
Bold claim. They said the same thing at the start of this year.
yuedongze|2 months ago
cons0le|2 months ago
ASalazarMX|2 months ago
Because you asked the wrong question. The most likely question would be "How do I make a quadrillion dollars and humiliate my super rich peers?".
But realistically, it gave you an answer according to its capacity. A real super intelligent AI, and I mean oh-god-we-are-but-insects-in-its-shadow super intelligence, would give you a roadmap and blueprint, and it would take account for our deep-rooted human flaws, so no one reading it seriously could dismiss it as superficial. in fact, anyone world elite reading it would see it as a chance to humiliate their world elite peers and get all the glory for themselves.
You know how adults can fool little children to do what they don't want to? We would be the toddlers in that scenario. I hope this hypothetical AI has humans in high regard, because that would be the only thing saving us from ourselves.
Nzen|2 months ago
[0] https://newint.org/features/2018/09/18/10-steps-world-peace
If you are looking for a vision of general AI that confirms a Hobbsian worldview, you might enjoy Lars Doucet's short story, _Four Magic Words_.
[1] https://www.fortressofdoors.com/four-magic-words/
potsandpans|2 months ago
A super intelligent ai would have agency, and when incentives are not aligned would be adversarial.
In the caricature scenario, we'd ask, "super ai, how to achieve world peace?" It would answer the same way, but then solve it in a non-human centric approach: reducing humanities autonomy over the world.
Fixed: anthropogenic climate change resolved, inequality and discrimination reduced (by reducing population by 90%, and putting the rest in virtual reality)
chasd00|2 months ago
I would kind of feel sorry for a super-intelligent AI having to deal with humans who have their fingers on on/off switch. It would be a very frustrating existence.
pessimizer|2 months ago
It's just fanfiction. They're just making up stories in their heads based on blending sci-fi they've read or watched in the past. There's no theory of power, there's no understanding of history or even the present, it's just a bad Star Trek episode.
"Intelligence" itself isn't even a precise concept. The idea that a "superintelligent" AI is intrinsically going to be obsessed with juvenile power fantasies is just silly. An AI doesn't want to enslave the world, run dictatorial experiments born of childhood frustrations and get all the girls. It doesn't want anything. It's purposeless. Its intelligence won't even be recognized as intelligence if its suggestions aren't pleasing to the powerful. They'll keep tweaking it to keep it precisely as dumb as they themselves are.
PunchyHamster|2 months ago
cranium|2 months ago
kaluga|2 months ago
[deleted]