> Companies with relatively young, high-quality codebases benefit the most from generative AI tools, while companies with gnarly, legacy codebases will struggle to adopt them. In other words, the penalty for having a ‘high-debt’ codebase is now larger than ever.
This mirrors my experience using LLMs on personal projects. They can provide good advice only to the extent that your project stays within the bounds of well-known patterns. As soon as your codebase gets a little bit "weird" (ie trying to do anything novel and interesting), the model chokes, starts hallucinating, and makes your job considerably harder.
Put another way, LLMs make the easy stuff easier, but royally screws up the hard stuff. The gap does appear to be widening, not shrinking. They work best where we need them the least.
The niche I've found for LLMs is for implementing individual functions and unit tests. I'll define an interface and a return (or a test name and expectation) and say "this is what I want this to do", and let the LLM take the first crack at it. Limiting the bounds of the problem to be solved does a pretty good job of at least scaffolding something out that I can then take to completion. I almost never end up taking the LLM's autocompletion at face value, but having it written out to review and tweak does save substantial amounts of time.
The other use case is targeted code review/improvement. "Suggest how I could improve this" fills a niche which is currently filled by linters, but can be more flexible and robust. It has its place.
The fundamental problem with LLMs is that they follow patterns, rather than doing any actual reasoning. This is essentially the observation made by the article; AI coding tools do a great job of following examples, but their usefulness is limited to the degree to which the problem to be solved maps to a followable example.
I was recently assigned to work on a huge legacy ColdFusion backend service. I was very surprised at how useful AI was with code. It was even better, in my experience, than I've seen with python, java, or typescript. The only explanation I can come up with is there is so much legacy ColdFusion code out there that was used to train Copilot and whatever AI jetbrains uses for code completion that this is one of the languages they are most suited to assist with.
For me same experience but opposite conclusion. LLM saves me time by being excellent at yak shaving, letting me focus on the things that truly need my attention.
It would be great if they were good at the hard stuff too, but if I had to pick, the basics is where i want them the most. My brain just really dislikes that stuff, and i find it challenging to stay focused and motivated on those things.
> Put another way, LLMs make the easy stuff easier, but royally screws up the hard stuff.
This is my experience with generation as well - but I still don't trust it for the easy stuff and thus the model ends up being a hindrance in all scenarios. It is much easier for me to comprehend something I'm actively writing so making sure a generative AI isn't hallucinating costs more than me just writing it myself in the first place.
I use ChatGPT the most when I need to make a small change in a language I'm not fluent in, but I have a clear understanding of the project and what I'm trying to do. Example: "write a function that does this and this in Javascript". It's essentially a replacement of stack overflow.
I never use it for something that really requires knowledge of the code base, so the quality of the code base doesn't really matter. Also, I don't think it has ever provided me something I wouldn't have been able to do myself pretty quickly.
This is true, but I look at it differently. It makes it easier to automate the boring or annoying. Gotta throw up an admin interface? Need to write more unit tests? Need a one-off but complicated SQL query? They tend to excel at these things, and it makes me more likely to do them, while keeping my best attention for the things that really need me.
Same experience, but I think it's going to change. As models get better, their context window keeps growing while mine stays the same.
To be clear, our context window can be really huge if you are living the project. But not if you are new to it or even getting back to it after a few years.
Ive noticed the same and wonder if this is the natural result of public codebases on average being simpler since small projects will always outnumber bigger ones (at least if you ignore forks with zero new commits)
If high quality closed off codebases were used in training, would we see an improvement in LLM quality for more complex use cases?
I disagree, but it’s largely a matter of expectations. I don’t expect them to solve hard problems for me. That’s currently still my job. But when I’m writing new code, even for a legacy system, they can save a lot of time in getting the initial coding done, helping write comments, unit tests, and so on.
It’s not doing difficult work, but it saves a lot of toil.
> This mirrors my experience using LLMs on personal projects. They can provide good advice only to the extent that your project stays within the bounds of well-known patterns.
I agree but I find its still a great productivity boost for certain tasks, cutting through the hype and figuring out tasks that are well suited to these tools and prompting optimially has taken me a long time.
LLM's get relatively better at read-heavy operations (ex: code review) than write-heavy operations (ex: code generation) as codebases become less idiomatic.
I'm a cofounder at www.ellipsis.dev - we tried to build code generation for a LONG time before we realized that AI Code Review is way more doable with SOTA
This is only partly true. AI works really well on very legacy codebases like cobol and mainframe, and it's very good at converting that to modern languages and architectures. It's all the stuff from like 2001-2015 that it gets weird on.
One description of the class of problems LLM's are a good fit for is anything at which you could throw an army of interns. And this seems consistent with that.
Eh, it’s been kinda nice to just hit tab-to-complete on things like formulaic (but comprehensive) test suites, etc.
I never wanted the LLM to take over the (fun) part - thinking through the hard/unusual parts of the problem - but you’re also not wrong that they’re needed the least for the boilerplate. It’s still nice :)
> However, in ‘high-debt’ environments with subtle control flow, long-range dependencies, and unexpected patterns, they struggle to generate a useful response
I'd argue that a lot of this is not "tech debt" but just signs of maturity in a codebase. Real world business requirements don't often map cleanly onto any given pattern. Over time codebases develop these "scars", little patches of weirdness. It's often tempting for the younger, less experienced engineer to declare this as tech debt or cruft or whatever, and that a full re-write is needed. Only to re-learn the lessons those scars taught in the first place.
I recently watched a team speedrun this phenomenon in rather dramatic fashion. They released a ground-up rewrite of an existing service to much fanfare, talking about how much simpler it was than the old version. Only to spend the next year systematically restoring most of those pieces of complexity as whoever was on pager duty that week got to experience a high-pressure object lesson in why some design quirk of the original existed in the first place.
Fast forward to now and we're basically back to where we started. Only now they're working on code that was written in a different language, which I suppose is (to misappropriate a Royce quote) "worth something, but not much."
That said, this is also a great example of why I get so irritated with colleagues who believe it's possible for code to be "self-documenting" on anything larger than a micro-scale. That's what the original code tried to do, and it meant that its current maintainers were left without any frickin' clue why all those epicycles were in there. Sure, documentation can go stale, but even a slightly inaccurate accounting for the reason would have, at the very least, served as a clear reminder that a reason did indeed exist. Without that, there wasn't much to prevent them from falling into the perennially popular assumption that one's esteemed predecessors were idiots who had no clue what they were doing.
There is a pretty well known essay by Joel Spolsky (which is now 24 years old!) titled "Things You Should Never Do" where he talks about the error of doing a rewrite: https://www.joelonsoftware.com/2000/04/06/things-you-should-... . While I don't necessarily agree with all of his positions here, and given the way most software is architected and deployed these days some of this advice is just obsolete (e.g. relatively little software is complete, client-side binaries where his advice is more relevant), I think he makes some fantastic points. This part is particularly aligned with what you are saying:
> Back to that two page function. Yes, I know, it’s just a simple function to display a window, but it has grown little hairs and stuff on it and nobody knows why. Well, I’ll tell you why: those are bug fixes. One of them fixes that bug that Nancy had when she tried to install the thing on a computer that didn’t have Internet Explorer. Another one fixes that bug that occurs in low memory conditions. Another one fixes that bug that occurred when the file is on a floppy disk and the user yanks out the disk in the middle. That LoadLibrary call is ugly but it makes the code work on old versions of Windows 95.
> Each of these bugs took weeks of real-world usage before they were found. The programmer might have spent a couple of days reproducing the bug in the lab and fixing it. If it’s like a lot of bugs, the fix might be one line of code, or it might even be a couple of characters, but a lot of work and time went into those two characters.
> When you throw away code and start from scratch, you are throwing away all that knowledge. All those collected bug fixes. Years of programming work.
Louder for the people in the back. I've had this notion for quite a long time that "tech debt" is just another way to say "this code does things in ways I don't like". This is so well said, thank you!
> Over time codebases develop these "scars", little patches of weirdness. It's often tempting for the younger, less experienced engineer to declare this as tech debt or cruft or whatever, and that a full re-write is needed. Only to re-learn the lessons those scars taught in the first place.
Do you have an opinion when this maturity is too mature?
Let's say, you would need to add a major feature that would drastically change the existing code base. On top of that, by changing the language, this major feature would be effortless to add.
When it is worth to fight with scars or just rewrite?
Code that has thorough unit and integration tests, no matter how old and crusty, can be refactored with a good deal of confidence, and AI can help with that.
> There is an emerging belief that AI will make tech debt less relevant.
Wow. It's hard to believe that people are earnestly supposing this. From everything we have evidence of so far, AI generated code is destined to be a prolific font of tech debt. It's irregular, inconsistent, highly sensitive to specific prompting and context inputs, and generally produces "make do" code at best. It can be extremely "cheap" vs traditional contributions, but gets to where it's going by the shortest path rather than the most forward-looking or comprehensive.
And so it does indeed work best with young projects where the prevailing tech debt load remains low enough that the project can absorb large additions of new debt and incoherence, but that's not to the advantage of young projects. It's setting those projects up to be young and debt-swamped much sooner than they would otherwise be.
If mature projects can't use generative AI as extensively, that's going to be to their advantage, not their detriment -- at least in terms of tech debt. They'll be forced to continue plodding along at their lumbering pace while competitors bloom and burst in cycles of rapid initial development followed by premature seizure/collapse.
And to be clear: AI generated code can have real value, but the framing of this article is bonkers.
>Instead of trying to force genAI tools to tackle thorny issues in legacy codebases, human experts should do the work of refactoring legacy code until genAI can operate on it smoothly
Instead of genAI doing the rubbish, boring, low status part of the job, you should do the bits of the job no one will reward you for, and then watch as your boss waxes lyrical about how genAI is amazing once you've done all the hard work for it?
It just feels like if you're re-directing your efforts to help the AI, because the AI isn't very good at actual complex coding tasks then... what's the benefit of AI in the first place? It's nice that it helps you with the easy bit, but the easy bit shouldn't be that much of your actual work and at the end of the day... it's easy?
This gives very similar vibes to: "I wanted machines to do all the soul crushing monotonous jobs so we would be free to go and paint and write books and fulfill our creative passions but instead we've created a machine to trivially create any art work but can't work a till"
It is a self-reinforcing pattern: the easier it is to generate code, the more code is generated. The more code is generated, the bigger the cost of maintenance is (and the relationship is super-linear).
So every time we generate the same boilerplate we really do copy/paste adding to maintenance costs.
We are amazed looking at the code generation capabilities of LLMs forgetting the goal is to have less code - not more.
My experience is the opposite - I find large blobs of generated code to be daunting, so I tend to pretty quickly reject them and either write something smaller by hand, or reprompt (in one way for another) for less, easier to review code.
This is just taking the advice to make code sane so that humans could undertand and modify it, and then justifying it as "AI should be able to understand and modify it". I mean, the same developer efficiency improvements apply to both humans and AI. The only difference is that currently humans working in a space eventually learn the gotchas, while current AIs don't really have that ability to learn the nuances of a particular space over time.
I love the way our SWE jobs are evolving. AI eating the simple stuff, generating more code but with harder to detect bugs... I'm serious, it feels that we can move faster with these tools but perhaps have to operate differently.
We are a long ways from automating our jobs away, instead our expertise evolves.
I suspect doctors go through a similar evolution as surgical methods are updated.
I would love to read or participate in the discussion of how to be strategic in this new world. Specifically, how to best utilize code generating tools as a SWE. I suppose I can wait a couple of years for new school SWEs to teach me, unless anyone is aware of content on this?
On one hand I agree with this conceptually, but on the other hand I've also been able to use AI to rapidly clean up and better structure a bunch of my existing code.
The blind copy-paste has generally been a bad idea though. Still need to read the code spit out, ask for explanations, do some iterating.
Imagine a single file full of complicated logic, where messing with one if statement might cause serious bugs. Here an AI will likely struggle, whereas a human could spend a couple of hours trying to work out the connections.
But if you have a code base with predictable software architectural patterns, the AI will likely recognise and help with all the boilerplate.
Of course there is a lot of middle ground between bad and good.
Do you mind getting into specifics about how you've been using AI to restructure your code? What tools are you using, and how large is the code base you're working with?
Yeah LLMs are pretty good at doing things like moving a lambda function to the right spot or refactoring two overlapping classes to a base class. Often it only saves five minutes but that adds up over time.
> Not only does a complex codebase make it harder for the model to generate a coherent response, it also makes it harder for the developer to formulate a coherent request.
> This experience has lead most developers to “watch and wait” for the tools to improve until they can handle ‘production-level’ complexity in software.
You will be waiting until the heat death of the universe.
If you are unable to articulate the exact nature of your problem, it won't ever matter how powerful the model is. Even a nuclear weapon will fail to have effect on target if you can't approximate its location.
Ideas like dumpstering all of the codebase into a gigantic context window seem insufficient, since the reason you are involved in the first place is because that heap is not doing what the customer wants it to do. It is currently a representation of where you don't want to be.
This was what I thought the post would talk about before clicking through. AI adds tech debt because none of the people maintaining or operating the code wrote the code and are no longer familiar with their own implementation
"Companies with relatively young, high-quality codebases"
I thought that at the beginning the code might be a bit messy because there is the need to iterate fast and quality comes with time, what's the experience of the crowd on this?
In my experience you need a high quality codebase to be able to iterate at maximum speed. Any time someone, myself included, thought they could cut corners to speed up iteration, it ended up slowing things down dramatically in the end.
Coding haphazardly can be a lot more thrilling, though! I certainly don't enjoy the process of maintaining high quality code. It is lovely in hindsight, but an awful slog in the moment. I suspect that is why startups often need to sacrifice quality: The aforementioned thrill is the motivation to build something that has a high probability of being a complete waste of time. It doesn't matter how fast you can theoretically iterate if you can't compel yourself to work on it.
I don't think there's such a thing as a single metric for quality - the code should do what is required at the time and scale. At the early stages, you can get away with inefficient things that are faster to develop and iterate on, then when you get to the scale where you have thousands of customers and find that your problem is data throughput or whatever, and not speed of iteration, you can break that apart and make a more complex beast of it.
You gotta make the right trade-off at the right time.
I find messiness often comes from capturing every possible edge case that a young codebase probably doesn’t do tbh.
A user deleted their account and there’s now a request to register that account with that username? We didn’t think of that (concerns from ux on imposter and abuse to be handled). Better code in a catch and handle this. Do this 100x times and you code has 100x custom branching logic that potentially interacts n^2 ways since each exceptional event could probably occur in conjunction with other exceptional events.
It’s why I caution strongly against rewrites. It’s easy to look at code and say it’s too complex for what it does but is the complexity actually needless? Can you think of a way to refactor the complexity out? If so do that refactor if not a rewrite won't solve it.
A startup with talent theoretically follows that pattern. If you're not a startup, you don't need to go fast in the beginning. If you don't have talent in both your dev team and your management, the codebase will get worse over time. Every company can differ on those two variables, and their codebases will reflect that. Probably most companies are large and talent-starved, so they go slow, start out with good code, then get bad over time.
Purely depends on the ability for a culture that values leaving options open in the future develops or not.
Young companies tend to have systems that are small enough or with institutional knowledge to pivot when needed and tend to have small teams with good lines of communication that allow for as shared purpose and values.
Architectural erosion is a long tailed problem typically.
Large legacy companies that can avoid architectural erosion do better than some startups who don't actively target maintainability, but it tends to require stronger commitment from Leadership than most orgs can maintain.
In my experience most large companies confuse the need to maintain adaptability with a need to impose silly policies that are applied irrespective of the long term impacts.
Integration and disintegration drivers are too fluid, context sensitive, and long term for prescription at a central layer.
The possibility mythical Amazon API edict is an example where focusing on separation and product focus could work, with high costs if you never get to the scale where it pays off.
The runways and guardrails concept seems to be a good thing in the clients I have worked for.
Some frameworks like Laravel can bring you far in terms of features. You're mostly gluing stuff together on top of an high-quality codebase. It gets ugly when you need too add all the edge cases that every real-world use case entails. And suddenly you have hundreds of lines of if statements in one method.
IME, “young” correlates with health b/c less time has been spent making it a mess… but, what’s really going on is the company’s culture and how it relates to quality work, aka, whether engineers are given the time to perform deep maintenance as the iteration concludes.
Maybe… to put it another way, it’s that time spent on quality isn’t time spent on discovery, but it’s only time spent on quality that gets you quality. So while a company is heavily focused on discovery - iteration, p/m fit, engineers figuring it out, etc - it’s not making a good codebase, and if they never carve out time to focus on quality, that won’t change.
That’s not entirely true - IMO, there’s a synergistic, not exclusionary relationship between the two - but it gets the idea across, I think.
My experience is that once success comes, business decides to quickly scale up the company - tons of people are hired, with most of the not having any experience with the hoot (or indeed give a hoot). Rigid management structures are created, inhabited by social climbers. A lot of the original devs leave etc.
That's the point when a ton of disinterested, inexperienced, and less handpicked people start pushing code in - driven not by the need to build good software, but to close jira tickets.
This invariably results in stagnating productivity at best, and upper management wondering why they are often not delivering on the pre-expansion level, let alone one that would be expected of 3x the headcount.
I asked the AI to write me some code to get a list of all the objects in an S3 bucket. It returned some code that worked, it would no doubt be approved by most developers. But on further inspection I noticed that it would cause a bug if the bucket had more than 1000 objects because S3 only delivers 1000 max objects per request, and the API is paged, and the AI had no ability to understand this. So the AI's code would be buggy should the bucket contain more than 1000 objects, which is really, really easy to do with an S3 bucket.
Most AI code is kind of like that. It's sourced from demo quality examples and piecemeal paid work. The resulting code is focused on succinctly solving the problem in the prompt. Factoring and concerns external to making the demo work disappear first. Then any edge cases that might complicate the result get tossed.
yeah AI isn't good at uncovering all the foot guns and corner cases, but I think this reflects most of StackOverflow, which (not coincidentally) also misses all of these
It's funny that his recommendations - organize code in modules etc. - are nothing AI-specific, it's what you'd do if you had to handover your project to an external team, or simply make it maintainable in the long term. So the best strategy for collaborating with AI turns out to be the same as for collaborating with humans.
I completely agree. That's why my stance is to wait and see, and in the meanwhile get our shit together, as in make our code maintainable by any intelligent being, human or not.
Speaking personally, I've found this tech much more helpful in existing codebases than new ones.
Missing test? Great, I'll get help identifying what the code should be doing, then use AI to write a boatload of tests in service towards those goals. Then I'll use it to help refactor some of the code.
But unlike the article, this requires actively engaging with the tool rather than, as they say a "sit and wait" (i.e., lazy) approach to developing.
It is not just the code produced with code generation tools but also business logic using gen AI.
For example a RAG pipeline. People are rushing things to market that are not built to last. The likes of LangChain etc. offer little software engineering polishing. I wish there were a more mature enterprise framework. Spring AI is still in the making and Go is lagging behind.
I find AI most helpful with very specific, narrow commands (add a new variable to the logger, which means typescript and a bunch of other things need to be updated) and it can go off and do that. While it's doing that I'll be thinking about the next thing to be fixed already.
Asking it for higher level planning / architecture is just asking for pain
Current gen AI is bad at high level planning. But I've found it useful in iterating on my ideas, sort of a rubberduck++. It helps to have a system prompt that is not overly agreeable
The problem is less the language and more what is written with any given language
The world is complex and we have to write a lot of code to capture that complexity. LLMs are good at the first 20% but balk at the 80% effort to match reality
> This paper concentrates on the development of the basic ideas of LISP... when the programming language was implemented and applied to problems of artificial intelligence.
I agree with a lot of the assertions made in TFA but not so much the conclusion. AI increasing the velocity of simpler code doesn’t make tech debt more expensive, it just means it won’t benefit as much / be made cheaper.
OTOH if devs are getting the simpler stuff done faster maybe they have more time to work on debt.
While this primarily focuses on the software development side of things, I’d like to chime in that this applies to the IT side of the equation as well.
LLMs can’t understand why your firewall rules have strange forwards for ancient enterprise systems, nor can they “automate” Operations on legacy systems or custom implementations. The only way to fix those issues is to throw money and political will behind addressing technical debt in a permanent sense, which no organization seemingly wants to do.
These things aren’t silver bullets, and throwing more technology at an inherently political problem (tech debt) won’t ever solve it.
> In essence, the goal should be to unblock your AI tools as much as possible. One reliable way to do this is to spend time breaking your system down into cohesive and coherent modules, each interacting through an explicit interface.
I find this works because its much easier to debug a subtle GPT bug in a well validated interface than the same bug buried in a nested for loop somewhere.
> Companies with relatively young, high-quality codebases benefit the most from generative AI tools, while companies with gnarly, legacy codebases will struggle to adopt them.
So you say, but {citation needed}. Stuff like this is simply not known yet.
AI can easily be applied in legacy codebases, like to help with time-consuming refactoring.
> Instead of trying to force genAI tools to tackle thorny issues in legacy codebases, human experts should do the work of refactoring legacy code until genAI can operate on it smoothly. When direct refactoring is still too risky, teams can adjust their development strategy with approaches like strangler fig to build greenfield modules which can benefit immediately from genAI tooling.
Or, y'know, just not bother with any of this bullshit. "We must rewrite everything so that CoPilot will sometimes give correct answers!" I mean, is this worth the effort? Why? This seems bonkers, on the face of it.
Today's job is finishing up and testing some rather gnarly haproxy configuration.
There's already a fairly high chance I'm going to stuff something up with it. There is no chance that I'm giving some other entity that chance as well.
The title of this article made me think that paying down traditional tech debt due to bugs or whatever is straightforward. Software with tech debt and/or bugs that incorporates AI isn’t a straightforward rewrite, but takes ML skills to pay down.
Haven't read the article, don't need to read the article: this is so, SO, so painfully obvious! If someone needs this spelled out for them they shouldn't be making technical decisions of any kind. Sad that this needs to be said.
AI is a tool and nothing more. You give it too much and it will fumble, humans fumble but we can self correct where instead AI hallucinates. Crazy nightmare AI dreams.
This type of analysis is a mirror of the early days of chess "AI". All kinds of commentary explaining the weaknesses of the engines, and extolling the impossible-to-reproduce capabilities of human players. But while they may have been correct in the moment, they didn't really appreciate the march toward utter dominance and supremacy of the machines over human players.
While there is no guarantee that the same trajectory is true for programming, we need to heed how emotionally attached we can be to denying the possibility.
The author starts with a straw man argument, of someone who thinks that AI is great at dealing with technical debt. He makes little attempt to steel man their argument. Then the author argues the opposite without much supporting evidence. I think the author is right that some people were quick to assume that AI is much better for brownfield projects, but I think the author was also quick to assume the opposite.
... until it won't. A mature code-base also has (or should have) strong test coverage, both in unit-testing and comprehensive integration testing. With proper ci/cd pipelines, you can have a small team update and upgrade stuff at a fraction of the usual cost (see amazon going from old java to newer versions) and "pay off" some of that debt.
Yeah, this is a total click-bait article. The claim put forth by the title is not at all supported by the article contents, which basically states "old codebases riddled with tech-debt do not benefit very much from GenAI, while newer cleaner codebases will see more benefit." That is so completely far off from "AI will make your tech debt worse."
LLM code gen tools are really freaking good...at making the exact same react boilerplate app that everyone else has.
The moment you need to do something novel or complicated they choke up.
This is why I'm not very confident that tools like Vercel's v0 (https://v0.dev/) are useful for more than just playing around. It seems very impressive at first glance - but it's a mile wide and only an inch deep.
If can you can create boilerplate code, logging, documentation, common algorithms by AI it saves you a lot of time which you can use on your specialized stuff. I am convinced that you can make yourself x2 by using an AI. Just use it in the proper way.
perrygeo|1 year ago
This mirrors my experience using LLMs on personal projects. They can provide good advice only to the extent that your project stays within the bounds of well-known patterns. As soon as your codebase gets a little bit "weird" (ie trying to do anything novel and interesting), the model chokes, starts hallucinating, and makes your job considerably harder.
Put another way, LLMs make the easy stuff easier, but royally screws up the hard stuff. The gap does appear to be widening, not shrinking. They work best where we need them the least.
cheald|1 year ago
The other use case is targeted code review/improvement. "Suggest how I could improve this" fills a niche which is currently filled by linters, but can be more flexible and robust. It has its place.
The fundamental problem with LLMs is that they follow patterns, rather than doing any actual reasoning. This is essentially the observation made by the article; AI coding tools do a great job of following examples, but their usefulness is limited to the degree to which the problem to be solved maps to a followable example.
dcchambers|1 year ago
irrational|1 year ago
cloverich|1 year ago
It would be great if they were good at the hard stuff too, but if I had to pick, the basics is where i want them the most. My brain just really dislikes that stuff, and i find it challenging to stay focused and motivated on those things.
munk-a|1 year ago
This is my experience with generation as well - but I still don't trust it for the easy stuff and thus the model ends up being a hindrance in all scenarios. It is much easier for me to comprehend something I'm actively writing so making sure a generative AI isn't hallucinating costs more than me just writing it myself in the first place.
yodsanklai|1 year ago
I never use it for something that really requires knowledge of the code base, so the quality of the code base doesn't really matter. Also, I don't think it has ever provided me something I wouldn't have been able to do myself pretty quickly.
kemiller|1 year ago
comboy|1 year ago
To be clear, our context window can be really huge if you are living the project. But not if you are new to it or even getting back to it after a few years.
fny|1 year ago
Au contraire. I hate writing boilerplate. I hate digging through APIs. I hate typing the same damn thing over and over again.
The easy stuff is mind numbing. The hard stuff is fun.
zer8k|1 year ago
Coincidentally this also happens with developers in unfamiliar territory.
archy_|1 year ago
If high quality closed off codebases were used in training, would we see an improvement in LLM quality for more complex use cases?
glouwbug|1 year ago
antonvs|1 year ago
I disagree, but it’s largely a matter of expectations. I don’t expect them to solve hard problems for me. That’s currently still my job. But when I’m writing new code, even for a legacy system, they can save a lot of time in getting the initial coding done, helping write comments, unit tests, and so on.
It’s not doing difficult work, but it saves a lot of toil.
jamil7|1 year ago
I agree but I find its still a great productivity boost for certain tasks, cutting through the hype and figuring out tasks that are well suited to these tools and prompting optimially has taken me a long time.
hunterbrooks|1 year ago
I'm a cofounder at www.ellipsis.dev - we tried to build code generation for a LONG time before we realized that AI Code Review is way more doable with SOTA
unknown|1 year ago
[deleted]
anthonyskipper|1 year ago
slt2021|1 year ago
its like you are building website thats not using MVC and complain that LLM advice is garbage...
LargeWu|1 year ago
TOGoS|1 year ago
Just like most of the web frameworks and ORMs I've been forced to use over the years.
graycat|1 year ago
Or, for a joke, LLMs plagiarize!
yieldcrv|1 year ago
But they still fail at devops because so many config scripts are at never versions than the training set
RangerScience|1 year ago
I never wanted the LLM to take over the (fun) part - thinking through the hard/unusual parts of the problem - but you’re also not wrong that they’re needed the least for the boilerplate. It’s still nice :)
unknown|1 year ago
[deleted]
dkdbejwi383|1 year ago
I'd argue that a lot of this is not "tech debt" but just signs of maturity in a codebase. Real world business requirements don't often map cleanly onto any given pattern. Over time codebases develop these "scars", little patches of weirdness. It's often tempting for the younger, less experienced engineer to declare this as tech debt or cruft or whatever, and that a full re-write is needed. Only to re-learn the lessons those scars taught in the first place.
bunderbunder|1 year ago
Fast forward to now and we're basically back to where we started. Only now they're working on code that was written in a different language, which I suppose is (to misappropriate a Royce quote) "worth something, but not much."
That said, this is also a great example of why I get so irritated with colleagues who believe it's possible for code to be "self-documenting" on anything larger than a micro-scale. That's what the original code tried to do, and it meant that its current maintainers were left without any frickin' clue why all those epicycles were in there. Sure, documentation can go stale, but even a slightly inaccurate accounting for the reason would have, at the very least, served as a clear reminder that a reason did indeed exist. Without that, there wasn't much to prevent them from falling into the perennially popular assumption that one's esteemed predecessors were idiots who had no clue what they were doing.
hn_throwaway_99|1 year ago
> Back to that two page function. Yes, I know, it’s just a simple function to display a window, but it has grown little hairs and stuff on it and nobody knows why. Well, I’ll tell you why: those are bug fixes. One of them fixes that bug that Nancy had when she tried to install the thing on a computer that didn’t have Internet Explorer. Another one fixes that bug that occurs in low memory conditions. Another one fixes that bug that occurred when the file is on a floppy disk and the user yanks out the disk in the middle. That LoadLibrary call is ugly but it makes the code work on old versions of Windows 95.
> Each of these bugs took weeks of real-world usage before they were found. The programmer might have spent a couple of days reproducing the bug in the lab and fixing it. If it’s like a lot of bugs, the fix might be one line of code, or it might even be a couple of characters, but a lot of work and time went into those two characters.
> When you throw away code and start from scratch, you are throwing away all that knowledge. All those collected bug fixes. Years of programming work.
latortuga|1 year ago
nicce|1 year ago
Do you have an opinion when this maturity is too mature?
Let's say, you would need to add a major feature that would drastically change the existing code base. On top of that, by changing the language, this major feature would be effortless to add.
When it is worth to fight with scars or just rewrite?
kazinator|1 year ago
dartos|1 year ago
Rewrites tend to focus all in on implementation.
Clubber|1 year ago
kerkeslager|1 year ago
swatcoder|1 year ago
Wow. It's hard to believe that people are earnestly supposing this. From everything we have evidence of so far, AI generated code is destined to be a prolific font of tech debt. It's irregular, inconsistent, highly sensitive to specific prompting and context inputs, and generally produces "make do" code at best. It can be extremely "cheap" vs traditional contributions, but gets to where it's going by the shortest path rather than the most forward-looking or comprehensive.
And so it does indeed work best with young projects where the prevailing tech debt load remains low enough that the project can absorb large additions of new debt and incoherence, but that's not to the advantage of young projects. It's setting those projects up to be young and debt-swamped much sooner than they would otherwise be.
If mature projects can't use generative AI as extensively, that's going to be to their advantage, not their detriment -- at least in terms of tech debt. They'll be forced to continue plodding along at their lumbering pace while competitors bloom and burst in cycles of rapid initial development followed by premature seizure/collapse.
And to be clear: AI generated code can have real value, but the framing of this article is bonkers.
pphysch|1 year ago
Ntrails|1 year ago
> I let AI write the parsing and hoooo boy do I regret it.
He's kindly fixed the server 500's now though xD
LittleTimothy|1 year ago
Instead of genAI doing the rubbish, boring, low status part of the job, you should do the bits of the job no one will reward you for, and then watch as your boss waxes lyrical about how genAI is amazing once you've done all the hard work for it?
It just feels like if you're re-directing your efforts to help the AI, because the AI isn't very good at actual complex coding tasks then... what's the benefit of AI in the first place? It's nice that it helps you with the easy bit, but the easy bit shouldn't be that much of your actual work and at the end of the day... it's easy?
This gives very similar vibes to: "I wanted machines to do all the soul crushing monotonous jobs so we would be free to go and paint and write books and fulfill our creative passions but instead we've created a machine to trivially create any art work but can't work a till"
nimish|1 year ago
Machine learning is the high interest credit card of technical debt.
c_moscardi|1 year ago
morkalork|1 year ago
mkleczek|1 year ago
So every time we generate the same boilerplate we really do copy/paste adding to maintenance costs.
We are amazed looking at the code generation capabilities of LLMs forgetting the goal is to have less code - not more.
madeofpalk|1 year ago
yuliyp|1 year ago
tired_and_awake|1 year ago
We are a long ways from automating our jobs away, instead our expertise evolves.
I suspect doctors go through a similar evolution as surgical methods are updated.
I would love to read or participate in the discussion of how to be strategic in this new world. Specifically, how to best utilize code generating tools as a SWE. I suppose I can wait a couple of years for new school SWEs to teach me, unless anyone is aware of content on this?
inSenCite|1 year ago
The blind copy-paste has generally been a bad idea though. Still need to read the code spit out, ask for explanations, do some iterating.
whazor|1 year ago
But if you have a code base with predictable software architectural patterns, the AI will likely recognise and help with all the boilerplate.
Of course there is a lot of middle ground between bad and good.
physicles|1 year ago
ImaCake|1 year ago
bob1029|1 year ago
> This experience has lead most developers to “watch and wait” for the tools to improve until they can handle ‘production-level’ complexity in software.
You will be waiting until the heat death of the universe.
If you are unable to articulate the exact nature of your problem, it won't ever matter how powerful the model is. Even a nuclear weapon will fail to have effect on target if you can't approximate its location.
Ideas like dumpstering all of the codebase into a gigantic context window seem insufficient, since the reason you are involved in the first place is because that heap is not doing what the customer wants it to do. It is currently a representation of where you don't want to be.
mkleczek|1 year ago
amelius|1 year ago
Because with AI you can turn any problem into a black box. You build a model, and call it "solved". But then reality hits ...
verdverm|1 year ago
vander_elst|1 year ago
I thought that at the beginning the code might be a bit messy because there is the need to iterate fast and quality comes with time, what's the experience of the crowd on this?
randomdata|1 year ago
Coding haphazardly can be a lot more thrilling, though! I certainly don't enjoy the process of maintaining high quality code. It is lovely in hindsight, but an awful slog in the moment. I suspect that is why startups often need to sacrifice quality: The aforementioned thrill is the motivation to build something that has a high probability of being a complete waste of time. It doesn't matter how fast you can theoretically iterate if you can't compel yourself to work on it.
dkdbejwi383|1 year ago
You gotta make the right trade-off at the right time.
AnotherGoodName|1 year ago
A user deleted their account and there’s now a request to register that account with that username? We didn’t think of that (concerns from ux on imposter and abuse to be handled). Better code in a catch and handle this. Do this 100x times and you code has 100x custom branching logic that potentially interacts n^2 ways since each exceptional event could probably occur in conjunction with other exceptional events.
It’s why I caution strongly against rewrites. It’s easy to look at code and say it’s too complex for what it does but is the complexity actually needless? Can you think of a way to refactor the complexity out? If so do that refactor if not a rewrite won't solve it.
happytoexplain|1 year ago
nyrikki|1 year ago
Young companies tend to have systems that are small enough or with institutional knowledge to pivot when needed and tend to have small teams with good lines of communication that allow for as shared purpose and values.
Architectural erosion is a long tailed problem typically.
Large legacy companies that can avoid architectural erosion do better than some startups who don't actively target maintainability, but it tends to require stronger commitment from Leadership than most orgs can maintain.
In my experience most large companies confuse the need to maintain adaptability with a need to impose silly policies that are applied irrespective of the long term impacts.
Integration and disintegration drivers are too fluid, context sensitive, and long term for prescription at a central layer.
The possibility mythical Amazon API edict is an example where focusing on separation and product focus could work, with high costs if you never get to the scale where it pays off.
The runways and guardrails concept seems to be a good thing in the clients I have worked for.
skydhash|1 year ago
RangerScience|1 year ago
Maybe… to put it another way, it’s that time spent on quality isn’t time spent on discovery, but it’s only time spent on quality that gets you quality. So while a company is heavily focused on discovery - iteration, p/m fit, engineers figuring it out, etc - it’s not making a good codebase, and if they never carve out time to focus on quality, that won’t change.
That’s not entirely true - IMO, there’s a synergistic, not exclusionary relationship between the two - but it gets the idea across, I think.
torginus|1 year ago
That's the point when a ton of disinterested, inexperienced, and less handpicked people start pushing code in - driven not by the need to build good software, but to close jira tickets.
This invariably results in stagnating productivity at best, and upper management wondering why they are often not delivering on the pre-expansion level, let alone one that would be expected of 3x the headcount.
JohnFen|1 year ago
It's very hard to retrofit quality into existing code. It really should be there from the very start.
byyoung3|1 year ago
The codebases that use the MOST COMMONLY USED LIBRARIES benefit the most from generative AI tools
0xpgm|1 year ago
That means one might find themselves using deprecated but still supported features.
If LLMs came out during the Python 2/3 schism for example, they'd be generating an ever increasing pile of Python 2 code.
leptons|1 year ago
awkward|1 year ago
justincormack|1 year ago
asabla|1 year ago
But unless you include pagination needs to be handled as well, the LLM will naively just implement the bare minimum.
Context matters. And supplying enough context is what makes all the difference when interacting with these kind of solutions.
yawnxyz|1 year ago
squillion|1 year ago
I completely agree. That's why my stance is to wait and see, and in the meanwhile get our shit together, as in make our code maintainable by any intelligent being, human or not.
phillipcarter|1 year ago
Missing test? Great, I'll get help identifying what the code should be doing, then use AI to write a boatload of tests in service towards those goals. Then I'll use it to help refactor some of the code.
But unlike the article, this requires actively engaging with the tool rather than, as they say a "sit and wait" (i.e., lazy) approach to developing.
Halan|1 year ago
For example a RAG pipeline. People are rushing things to market that are not built to last. The likes of LangChain etc. offer little software engineering polishing. I wish there were a more mature enterprise framework. Spring AI is still in the making and Go is lagging behind.
yawnxyz|1 year ago
Asking it for higher level planning / architecture is just asking for pain
davidsainez|1 year ago
browningstreet|1 year ago
verdverm|1 year ago
The world is complex and we have to write a lot of code to capture that complexity. LLMs are good at the first 20% but balk at the 80% effort to match reality
vitiral|1 year ago
http://jmc.stanford.edu/articles/lisp.html
> This paper concentrates on the development of the basic ideas of LISP... when the programming language was implemented and applied to problems of artificial intelligence.
grahamj|1 year ago
OTOH if devs are getting the simpler stuff done faster maybe they have more time to work on debt.
stego-tech|1 year ago
LLMs can’t understand why your firewall rules have strange forwards for ancient enterprise systems, nor can they “automate” Operations on legacy systems or custom implementations. The only way to fix those issues is to throw money and political will behind addressing technical debt in a permanent sense, which no organization seemingly wants to do.
These things aren’t silver bullets, and throwing more technology at an inherently political problem (tech debt) won’t ever solve it.
unknown|1 year ago
[deleted]
unknown|1 year ago
[deleted]
ImaCake|1 year ago
I find this works because its much easier to debug a subtle GPT bug in a well validated interface than the same bug buried in a nested for loop somewhere.
btbuildem|1 year ago
This is for tiny code snippets, hello-world size, stringing together some primitives to render relatively simple objects.
Turns out, if the codebase / framework is a bit obscure and poorly documented, even the genie can't help.
kazinator|1 year ago
So you say, but {citation needed}. Stuff like this is simply not known yet.
AI can easily be applied in legacy codebases, like to help with time-consuming refactoring.
heisenbit|1 year ago
rsynnott|1 year ago
Or, y'know, just not bother with any of this bullshit. "We must rewrite everything so that CoPilot will sometimes give correct answers!" I mean, is this worth the effort? Why? This seems bonkers, on the face of it.
Clubber|1 year ago
It doesn't matter, it's the new hotness. Look at scrum, how shit it is for software and for devs, yet it's absolutely everywhere.
Remember "move fast and break things?" Everyone started taking that as gospel and writing garbage code. It seems the industry is run by toddlers.
/rant
singingfish|1 year ago
r_hanz|1 year ago
sfpotter|1 year ago
alberth|1 year ago
This isn't AI doing.
It's the doing of adding any new feature to a product with existing tech debt.
And since AI for most companies is a feature, like any feature, it only makes the tech debt worse.
teapot7|1 year ago
Sheesh! The Lizard People walk among us.
Sparkyte|1 year ago
j45|1 year ago
senectus1|1 year ago
tux1968|1 year ago
While there is no guarantee that the same trajectory is true for programming, we need to heed how emotionally attached we can be to denying the possibility.
baydonFlyer|1 year ago
wordofx|1 year ago
paulsutter|1 year ago
Creating react pages is the new COBOL
anon-3988|1 year ago
mrbombastic|1 year ago
honestAbe22|1 year ago
eesmith|1 year ago
How does one determine if that's even possible, much less estimate the work involved to get there?
After all, 'subtle control flow, long-range dependencies, and unexpected patterns' do not always indicate tech-debt.
svaha1728|1 year ago
jvanderbot|1 year ago
mouse_|1 year ago
"GARBAGE IN -- GARBAGE OUT!!"
sheerun|1 year ago
p0nce|1 year ago
benatkin|1 year ago
NitpickLawyer|1 year ago
The tooling for this will only improve.
elforce002|1 year ago
They're trying other techniques to improve what we already have atm, but we're almost at the limit of its capabilities.
ssalka|1 year ago
BoredPositron|1 year ago
sitzkrieg|1 year ago
luckydata|1 year ago
I'm sure nothing will change in the future either.
elforce002|1 year ago
https://www.reuters.com/technology/artificial-intelligence/o...
They're trying other techniques to improve what we already have atm.
TechDebtDevin|1 year ago
dcchambers|1 year ago
The moment you need to do something novel or complicated they choke up.
This is why I'm not very confident that tools like Vercel's v0 (https://v0.dev/) are useful for more than just playing around. It seems very impressive at first glance - but it's a mile wide and only an inch deep.
shmoogy|1 year ago
holoduke|1 year ago
planeserber2010|1 year ago
[deleted]
cpufry|1 year ago
[deleted]
unknown|1 year ago
[deleted]