The study shows a doubling in the rate at which tech debt code is produced and checked into the repo.
Anecdotally, as a principal engineer, I’ve definitely noticed that new senior engineers on the team that say they are using chatgpt/copilot produce unprecedentedly bad code at unprecedented rates.
It takes me 2-3x longer to unwind such crap than it would for me to write it from scratch.
As we grow the team, this will definitely put us out of business unless we find a way to fix it.
Currently, we’re hoping the AI assisted engineers will get better at unborking code before merging it, but that’s a harder task than RTFM or going to stack overflow to copy-paste.
I don't have great English words for this, but the biggest concern for me with LLMs is that of all the text generation algorithms I've ever seen, they are just fantastic at producing output whose plausibility to the human mind greatly exceeds its actual quality, the difficulty of concretely measuring either of those values notwithstanding.
Note I'm not even strictly speaking criticizing the quality of the output per se. It is also a big jump over any previous technology and very impressive in its own way.
It is, nevertheless, quite dangerous because the jump in the human-perceived plausibility is much larger than the quality improvement.
Whereas earlier techs were obviously wrong to a human reader, in the case of code generation so obviously wrong that we never even considered using them, LLMs are extremely good at hiding the errors in the parts of the code that we are cognitively most inclined to overlook. This also has the effect of making it bizarrely difficult code to fix.
How it does this I do not know. A fascinating research question for some ambitious cognitive scientist. But the signal is very strong and I don't need to wait for a paper to come out to see it.
I do not think this is fundamental to AI. As I like to remind people, LLMs are not the whole of AI. They're just one technique, and one that partially for the very reason I discuss in this post, one I expect to eventually become a part of a larger system that can fix this problem at some higher level. I expect people to someday look back and laugh at us for thinking that LLMs could be used for all the things we think they can be used for. But the reasons they will be laughing are the very experience we're gathering now, and there's no skipping that phase.
I was going to ask, why aren't they running the code through unit tests if they're committing such shit code? Does it pass the tests, or is it just inefficient code?
I worked as a high school bureaucrat for several years, and through some simple scripts, I was able to make some tedious data entry tasks vastly faster. It was ugly, hacked together code, with lots of hardcoded values I had to update each semester, but it worked and was way better than doing it all by hand. Low quality code is worse than good quality code, but often better than manual labor.
I want my web browser and my bank to use high quality code, but I want semi-technical people who would otherwise do everything manually to be able to automate tedious portions of their jobs.
(The caveat, of course, being that a buggy script can screw up the thing the thing it's automating must faster than a manual process as well.)
Depends a lot on the type of code. Front end code? Almost any website has some js/css/html errors on it and most of the time its not that big a deal. Backend bank routing code? Pretty important to get right.
From what I've seen, many early career engineers are the ones using code assist tools. Many early career engineers are often placed on lower stakes front end focused teams as well. This is mostly anecdotal data from non-traditional cs background engs that I know.
Kinda have to qualify "quality" by varied standards. Is it good enough for what the author needs? Then what matter is it that some measure of quality is not hit?
hedora|2 years ago
Anecdotally, as a principal engineer, I’ve definitely noticed that new senior engineers on the team that say they are using chatgpt/copilot produce unprecedentedly bad code at unprecedented rates.
It takes me 2-3x longer to unwind such crap than it would for me to write it from scratch.
As we grow the team, this will definitely put us out of business unless we find a way to fix it.
Currently, we’re hoping the AI assisted engineers will get better at unborking code before merging it, but that’s a harder task than RTFM or going to stack overflow to copy-paste.
jerf|2 years ago
Note I'm not even strictly speaking criticizing the quality of the output per se. It is also a big jump over any previous technology and very impressive in its own way.
It is, nevertheless, quite dangerous because the jump in the human-perceived plausibility is much larger than the quality improvement.
Whereas earlier techs were obviously wrong to a human reader, in the case of code generation so obviously wrong that we never even considered using them, LLMs are extremely good at hiding the errors in the parts of the code that we are cognitively most inclined to overlook. This also has the effect of making it bizarrely difficult code to fix.
How it does this I do not know. A fascinating research question for some ambitious cognitive scientist. But the signal is very strong and I don't need to wait for a paper to come out to see it.
I do not think this is fundamental to AI. As I like to remind people, LLMs are not the whole of AI. They're just one technique, and one that partially for the very reason I discuss in this post, one I expect to eventually become a part of a larger system that can fix this problem at some higher level. I expect people to someday look back and laugh at us for thinking that LLMs could be used for all the things we think they can be used for. But the reasons they will be laughing are the very experience we're gathering now, and there's no skipping that phase.
rabuse|2 years ago
lkbm|2 years ago
I worked as a high school bureaucrat for several years, and through some simple scripts, I was able to make some tedious data entry tasks vastly faster. It was ugly, hacked together code, with lots of hardcoded values I had to update each semester, but it worked and was way better than doing it all by hand. Low quality code is worse than good quality code, but often better than manual labor.
I want my web browser and my bank to use high quality code, but I want semi-technical people who would otherwise do everything manually to be able to automate tedious portions of their jobs.
(The caveat, of course, being that a buggy script can screw up the thing the thing it's automating must faster than a manual process as well.)
AlwaysRock|2 years ago
From what I've seen, many early career engineers are the ones using code assist tools. Many early career engineers are often placed on lower stakes front end focused teams as well. This is mostly anecdotal data from non-traditional cs background engs that I know.
jcutrell|2 years ago