New GitHub Copilot research finds 'downward pressure on code quality'

[+] web3-is-a-scam|2 years ago|reply

I cancelled my subscription after 2 months because I was spending way too much mental effort going over all of the code vomit fixing all of the mistakes. And it was basically useless when trying to deal with anything non-trivial or anything to do with SQL (even when I frontloaded it with my entire schema).

It was much less effort to just write everything myself because I actually know what I want to write and fixing my own mistakes was easier than fixing the bot’s.

I weep for the juniors that will be absolutely crushed by this garbage.

[+] ben_w|2 years ago|reply

> I cancelled my subscription after 2 months because I was spending way too much mental effort going over all of the code vomit fixing all of the mistakes. And it was basically useless when trying to deal with anything non-trivial or anything to do with SQL (even when I frontloaded it with my entire schema).

Good to know, that means I'm still economically useful.

I'm using ChatGPT rather than Copilot, and I'm surprised by how much it can do, but even so I wouldn't call it "good code" — I use it for JavaScript, because while I can (mostly) read JS code, I've spent the last 14 years doing iOS professionally and therefore don't know what's considered best practice in browser-land. Nevertheless, even though (usually) I get working code, I can also spot it producing bad choices and (what seems like) oddities.

> I weep for the juniors that will be absolutely crushed by this garbage.

Indeed.

You avoid the two usual mistakes I see with current AI, either thinking it's already game over for us or that it's a nothing-burger.

For the latter, I normally have to roll out a quote I can't remember well enough to google, that's something along the lines of "your dog is juggling, filing taxes, and baking a cake, and rather than be impressed it can do any of those things, you're complaining it drops some balls, misses some figures, and the cake recipe leaves a lot to be desired".

[+] Syntaf|2 years ago|reply

As with most things in life, moderation is key.

I find co-pilot primarily useful as an auto-complete tool to save keystrokes when writing predictable context driven code.

Writing an enum class in one window? Co-pilot can use that context to auto complete usage in other windows. Writing a unit test suite? Co-pilot can scaffold your next test case for you with a simple tab keystroke.

Especially in the case of dynamic languages, co-pilot nicely compliments your intellisense

[+] chrismorgan|2 years ago|reply

> I weep for the juniors that will be absolutely crushed by this garbage.

This is the real danger of this sort of thing. When your Copilot or whatever are good enough that they replace what is vastly superior for purely economic reasons.

I wrote about this trend applied to the unfortunately inevitable doom of the voice acting industry in favour of text-to-speech models a couple of months ago, using my favourite examples of typesetting, book binding and music engraving: https://news.ycombinator.com/item?id=38491203.

But when it’s development itself that gets hollowed out like this, I’m not sure what the end state is, because it’s the developers who led past instances of supplanting. Some form of societal decline and fall doesn’t feel implausible. (That sentence really warrants expansion into multiple paragraphs, but I’m not going to. It’s a big topic.)

[+] thepasswordis|2 years ago|reply

Oh man this is the opposite of my experience!

Copilot has replaced almost all of the annoying tedious stuff, especially stuff like writing (simple) SQL queries.

“Parse this json and put the fields into the database where they belong” is a fantastic use case for copilot writing SQL.

(Yes I’m sure there’s an ORM plugin or some middleware I could write, but in an MVP, or a mock-up, that’s too much pre optimization)

[+] Yodel0914|2 years ago|reply

When I've tried codepilot and similar tools, I've found them rather unimpressive. I assumed it was because I hadn't put the time in to learn how to make the best use of it, but maybe it's just that it's not very good.

On the other hand, I use ChatGPT (via the API) quite often, and it's very handy. For example, I wrote a SQL update that needed to touch millions of rows. I asked ChatGPT to alter the statement batch the updates, and then asked it to log status updates after each batch.

As another example, I was getting a 401 accessing a nuget feed from Azure DevOps - I asked ChatGPT what it could be and it not only told me, but gave me the yaml to fix it.

In both cases, this is stuff I could have done myself after a bit of research, but it's really nice to not have to.

[+] l5870uoo9y|2 years ago|reply

This is the problem with using AI for generating SQL statements; it doesn't know the semantics of the your database schema. If you are still open for a solution, I recently deployed a solution[1] that combines AI, db schema and simple way train AI to know your database schema.

Essentially, you "like" correct (or manually corrected) generations and a vectorized version is stored and used in future similar generations. An example could be tell which table or foreign key is preferred for a specific query or that is should wrap columns in quotes.

From my preliminary tests it works well. I was able to consistently make it use correct tables, foreign keys and quotes on table/column name for case-sensitivity using only a couple of trainings. Will open a public API for that soon too.

[1]: https://www.sqlai.ai/

[+] WithinReason|2 years ago|reply

I'm not using Copilot to write code, I use it for autocomplete. For that it's great.

[+] atoav|2 years ago|reply

I never even started. On a vacation I tried to get a LLM to write me an interpolation function for two hours. I had a test data set and checks laid out. Not a single of the resulting algorithms passed all the checks, most didn't even do what I asked for and a good chunk didn't even run.

LLMs give you plausible text. That does not mean it is logically coherent or says what it should.

[+] keeganpoppen|2 years ago|reply

it's crazy how amazing it is sometimes for certain things, including comments (and including my somewhat pithy style of comment prose), and how incredible and thorough the wastes of time when it autosuggests some function that is plausible, defined, but yet has some weird semantics such that it subtly ruins my life until i audit every line of code i've written (well, "written", i suppose). i finally disabled it, but doing so _did_ make me kind of sad-- it is nice for the easy things to be made 25x easier, but, for programming, not at the expense of making the hard stuff 5x harder (note i didn't say 10x, nor 100x. 5x). it's not that it's that far away from being truly transformative, it's just that the edge cases are really, really rough because for it to be truly useful you have to trust it pretty much completely & implicitly, and i've just gotten snakebitten in the most devious ways a handful of times. an absolute monster of / victim of the pareto principle, except it makes the "90%" stuff 1.5x easier and the "10%" stuff 5x harder (yes, i know i haven't been using my "k%"s rigorously) (and for those keeping score at home, that adds up to making work 10% harder net, which i'd say is about right in my personal experience).

highlights: "the ai" and i collaboratively came up with a new programming language involving defining a new tag type in YAML that lets one copy/paste from other (named) fragments of the same document (as in: `!ref /path/to/thing to copy`) (the turing completeness comes from self-referential / self-semi-overlapping references (e.g. "!ref /name/array[0:10]`) where one of the elements thus referred-to is, itself, a "!ref" to said array).

lowlights: as already alluded to, using very plausible, semi-deprecated API functions that either don't do what you think they do, or simply don't work the way one would think they do. this problem is magnified by googling for said API functions only to find cached / old versions of API docs from a century ago that further convince you that things are ok. nowadays, every time i get any google result for a doc page i do a little ritual to ensure they are for the most recent version of the library, because it is absolutely insane how many times i've been bitten by this, and how hard.

[+] dehrmann|2 years ago|reply

> It was much less effort to just write everything myself because I actually know what I want to write and fixing my own mistakes was easier than fixing the bot’s.

Echoing this, it takes longer to read code than to write it, so generally, if you know what you want to write and it's non-trivial, you'll spend more time groking AI-written code for correctness than writing it from scratch.

[+] orbisvicis|2 years ago|reply

I've been using Bing's GPT-4 to learn Fortran for about a week. Well, I'm mainly reading a book which is project-oriented so I'm often researching topics which aren't covered in enough detail or covered later. I think this mixed-mode approach is great for learning. Since Fortran's documentation is sparse and Google's results are garbage the output of GPT4 helps cut through a lot of the cruft. Half the time it teaches me, the rest I'm teaching it and correcting its mistakes. I certainly won't trust it for anything half complicated but I think it does a good job linking to trustworthy supporting sources, which is how I learned how to cast assumed-rank arrays using c_loc/c_f_pointer and sequence-association using (*) or (1). It's great for learning new concepts and I imagine it would be great for pair-coding in which it suggests oversights to your code. However I can't imagine depending on it to generate anything from scratch. What's surprising is how little help the compiler is - about as bad a resource as SEO junk. I'm used to "-Wall -Werror" from C, but so many gfortran warnings are incorrect.

[+] ec109685|2 years ago|reply

If you used copilot in the beginning (and I think still with some plans), it was only GPT 3.5.

Likely you’d get much better results with GPT-4.

[+] _teyd|2 years ago|reply

I think there is a sweet spot where if you're junior on the cusp of intermediate it can help you because you know enough to reject the nonsense, but it can point you in the right direction. Similar to if you need to implement a small feature in a language you don't know, but basically know what needs to get done.

I've definitely seen juniors just keep refining the garbage until it manages to pass a build and then try to merge it, though, and using it that way just sort of makes you a worse programmer because you don't learn anything and it just makes you more dependent on the bot. Companies without good code reviews are just going to pile this garbage on top of garbage.

[+] havaloc|2 years ago|reply

Using GPT-4 has significantly enhanced the efficiency of my work. My focus is primarily on developing straightforward PHP CRUD applications for addressing problems in my day-to-day job. These applications are simple, and don't use frameworks and MVC structures, which makes the code generated by GPT-4, based on my precise instructions, easy to comprehend and usually functional right out of the prompt. I find if I listen to the users needs I can make something that addresses a pain point easily.

I often request modifications to code segments, typically around 25 lines, to alter the reporting features to meet specific needs, such as group X and total Y on this page. GPT-4 responds accurately to these requests. After conducting a quick QA and test, the task is complete. This approach has been transformative, particularly effective for low-complexity tasks and clear-cut directives.

This process reminds me of how a senior programmer might delegate: breaking down tasks into fundamental components for a junior coder to execute. In my case, GPT-4 acts as the junior programmer, providing valuable assistance at a modest cost of $20 per month. I happily pay that out of pocket to save myself time.

However, much like how a younger version of myself asked why we had to learn math if the calculator does it for us, I know understand why we do that. I think the same thing applies here. If you don't know the fundamentals, you won't be effective. If GPT-4 had been around when I learned to write PHP (don't @ me!), I probably wouldn't understand the fundamentals as well as I do. I have the benefit of learning how to do it before it was a thing, and then benefitting from the new tool being available.

I also don't find the code quality to be any less, if anything what it spits out is a bit more polished (sometimes!).

[+] kromem|2 years ago|reply

Yeah, a lot of times it has better code quality, but more subtle bugs than what I'd be prone to produce.

I think a lot of the criticisms are premature, and it's more a stumbling step forward with need for support from additional infrastructure.

Where's the linter integration so it doesn't spit out a result that won't compile? Where's the automatic bug check and fix for low hanging fruit errors?

What should testing look like or change around in a gen AI development environment?

In general, is there something like TDD or BDD that is going to be a better procedural approach to maximizing the gains to be had while minimizing the costs?

A lot of the past year or two has been dropping a significant jump and change in tech into existing workflows.

Like any tool, there's the capabilities of the tool itself and the experience of the one wielding it that come together to make the outcome.

The industry needs a lot more experience and wisdom around incorporation of gen AI in development before we'll realistically have a sense of its net worth. I'd say another 2-3 years at least - not because the tech will take that long to adapt, but because that's how long the humans will take to have sufficiently adapted.

[+] hackernewds|2 years ago|reply

precisely. we are very lucky to be during the timeline where ChatGPT was released during our later years, that we didn't have to compete with auto created code during our learning formative years.

[+] elendee|2 years ago|reply

this is you though, as opposed to the new paradigm of coding that is threatening to be ushered in. "Generate code, test, fail, regenerate, test...". Without ever breaking down constituent parts.

I already worked with a team of 20 something's who were generating mountains of full stack spaghetti on top of the basic CRUD framework I built them.

There's lessening incentive to build your TODO app from scratch when you can generate an "MMO framework" in 60 seconds.

The same way I first used firebase 12 years ago before trying to learn the basics of relational, and it was years before I finally arrived at the basics.

[+] therealdrag0|2 years ago|reply

How do you interface with it? Are you pasting chunks of code into chat? Or just describing new code to write and then giving it feedback to rewrite it? Or something else?

[+] danielovichdk|2 years ago|reply

When I look into the future, and I know that I really can't, one thing I really believe in is that there will be a shift in how quality will be perceived.

With all things around me there is a sense that technology is to be a saviour for many very important things - ev's, medicine, it, finance etc.

At the same time it is more and more clear to me that technology is used primarily to grow a market, government, country etc. But it does that by layering on top of already leaking abstractions. It's like solving a problem by only trying to solvent be its symptoms.

Quality has a sense of slowness to it which I believe will be a necessary feat, both due to the fact that curing symptoms will fall short and because I believe that the human species simply cannot cope with the challenges by constantly applying more abstractions.

The notion about going faster is wrong to me, mostly because I as a human being do not believe that quality is done by not understanding the fundamentals of a challenge, and by trying to solve it for superficial gains is simply unintelligent.

LLMs is a disaster to our field because it caters to the average human fallacy of wanting to reach a goal but without putting in the real work to do so.

The real work is of course to understand what it is that you are really trying to solve with applying assumptions about correctness.

Luckily not all of us is trying to move faster but instead we are sharpening our minds and tools while we keep re-learing the fundamentals and applying thoughtful decisions in hope to make quality that will stand the test of time.

[+] dweinus|2 years ago|reply

The methodology seems to be: compare commit activity from 2023 to prior years, without any idea of how many involve Copilot. Then interpret those changes with assumptions. That seems a bit shakey.

Also: "The projections for 2024 utilize OpenAI's gpt-4-1106-preview Assistant to run a quadratic regression on existing data." ...am I to understand they asked gpt to do a regression on the data (4 numbers) rather than running a simple regression tool (sklearn, r, even excel can do this)? Even if done correctly, it is not very compelling when based off of 4 data points and accounting for my first concern.

[+] zemo|2 years ago|reply

check out the paper, not just the summary. They explain their methodology. The output has four data points because it’s a summary. The input is … more data than that.

[+] panaetius|2 years ago|reply

Not even that, the prompt used is "Looking only at the years 2022 and 2023, what would a quadratic regression predict for 2024" as mentioned in the appendix.

So quadratic regression makes it sound all fancy, but with two data points, it's literally just "extend the line straight". So the 2024 prediction is essentially meaningless.

[+] zeroonetwothree|2 years ago|reply

I’m sympathetic to the study results since I have seen similar things anecdotally but I agree their data is not really warranting the conclusions they reach. For all we know it could because of the covid hiring spree and subsequent layoffs.

[+] wbharding|2 years ago|reply

Original research author here. It's exciting to find so many thinking about long-term code quality! The 2023 increase in churned & duplicated (aka copy/pasted) code, alongside the reduction in moved code, was certainly beyond what we expected to find.

We hope it leads dev teams, and AI Assistant builders, to adopt measurement & incentives that promote reused code over newly added code. Especially for those poor teams whose managers think LoC should be a component of performance evaluations (around 1 in 3, according to GH research), the current generation of code assistants make it dangerously easy to hit tab, commit, and seed future tech debt. As Adam Tornhill eloquently put it on Twitter, "the main challenge with AI assisted programming is that it becomes so easy to generate a lot of code that shouldn't have been written in the first place."

That said, our research significance is currently limited in that it does not directly measure what code was AI-authored -- it only charts the correlation between code quality over the last 4 years and the proliferation of AI Assistants. We hope GitHub (or other AI Assistant companies) will consider partnering with us on follow-up research to directly measure code quality differences in code that is "completely AI suggested," "AI suggested with human change," and "written from scratch." We would also like the next iteration of our research to directly measure how bug frequency is changing with AI usage. If anyone has other ideas for what they'd like to see measured, we welcome suggestions! We endeavor to publish a new research paper every ~2 months.

[+] oooyay|2 years ago|reply

> We hope it leads dev teams, and AI Assistant builders, to adopt measurement & incentives that promote reused code over newly added code.

imo, this is just replacing one silly measure with another. Code reuse can be powerful within a code base but I've witnessed it cause chaos when it spans code bases. That's to say, it can be both useful and inappropriate/chaotic and the result largely depends on judgement.

I'd rather us start grading developers based on the outcomes of software. For instance, their organizational impact compared to their resource footprint or errors generated by a service that are not derivative of a dependent service/infra. A programmer is responsible for much more than just they code they right; the modern programmer is a purposefully bastardized amalgamation of:

- Quality Engineer / Tester

- Technical Product Manager

- Project Manager

- Programmer

- Performance Engineer

- Infrastructure Engineer

Edit: Not to say anything of your research; I'm glad there are people who care so deeply about code quality. I just think we should be thinking about how to grade a bit differently.

[+] lolinder|2 years ago|reply

> That said, our research significance is currently limited in that it does not directly measure what code was AI-authored -- it only charts the correlation between code quality over the last 4 years and the proliferation of AI Assistants

So, would a more accurate title for this be "New research shows code quality has declined over the last four years"? Did you do anything to control for other possible explanations, like the changing tech economy?

[+] nephrenka|2 years ago|reply

> our research significance is currently limited in that it does not directly measure what code was AI-authored

There is actual AI benchmarking data in the Refactoring vs Refuctoring paper: https://codescene.com/hubfs/whitepapers/Refactoring-vs-Refuc...

That paper benchmarked the performance of the most popular LLMs on refactoring tasks on real-world code. The study found that the AI only delivered functionally correct refactorings in 37% of the cases.

AI-assisted coding is genuinely useful, but we (of course) need to keep skilled humans in the loop and set realistic expectations beyond any marketing hype.

[+] mrweasel|2 years ago|reply

People have different workflows, but mine is frequently, skim the documentation, make a prototype, refine code a bit, add tests, move stuff around, break stuff, rework code, study documentation, refactor a bit more, and then at that point I have enough understanding of the problem to go in at yank out 80% of my code and do it right.

If Copilot gives me working code in the prototype stage, good enough that I can just move on to the next thing, my understanding is never going to be good enough that I can go in and structure everything correctly. It will effectively allow me to skip 90% of my workflow, but pay the price. That's not to say that Copilot can't be extremely helpful during the final steps of development.

If those findings are correct, I can't say that I'm surprised. Bad code is written by poor understanding and Copilot can't have any understanding beyond what you provide it. It may write better code than the average programmer, but the result is no better than the input given. People are extremely focused on "prompt engineering", so why act surprised when a poor "prompt" in VScode yields a poor result?

[+] andybak|2 years ago|reply

I'm not sure why you decided that "use copilot" also implies missing out most of your later steps. Who decides to skip all those steps? Presumably you?

My experience is that Copilot is great at getting me started. Sometimes the code is good, sometimes it's mediocre or completely broken.

But it's invaluable at getting me thinking. I wasted a lot more time before I started using it. That might just be my weird brain wiring...

(Edited to sound less narky. I shouldn't post from a mobile device)

[+] spaniard89277|2 years ago|reply

I'm a junior, and I have Codeium installed in VSCode. I've found it very distracting most of the times, I don't really understand why so many people uses this kind of assistants.

I find stuff like Phind useful, in the sense that sometimes something happens that I don't understand, and 60% of the times Phind actually helps me to understand the problem. Like finding trivial bugs that I didn't spot because I'm tired, dumb, etc.

On the other hand, with Codeium, I guess it may be useful when you're just churning boilerplate code for some framework, but in my little expericence (writing scrapers and stupid data pipelines & vanilla JS + HTML/CSS) cycling through suggestions is very irritating, specially because many times it doesn't work. Most of the times for stupid reasons, like lacking an argument or something like that, but then it's time you have to spend debugging it.

Another problem I have is that I find there's a common style of JS which consist in daisy-chaining a myriad of methods and anonymous functions, and I really struggle with this. I like to break stuff into lines, name my functions and variables, etc. And so many times code suggestions follow this style. I guess it's what they've been trained on.

Codeium is supposed to learn from this, and sometimes it does, to be fair.

But what I worry the most is that, If I'm a junior and I let this assistants do the code for me ¿How the hell I'm supposed to learn? Because giving Phind context + questions helps me learn or gives me direction to go on find it by myself in the internet, but if the only thing I do is press tab, I don't know how the hell I'm supposed to learn.

I found a couple days ago that many people (including devs) are not using LLMs to get better but it's just a substitute of their effort. Isn't people afraid of this? Not because companies are going to replace you, but it's also a self-reflection issue.

Coding is not the passion of my life, addmitedly, but I like it. I like it because it helps me to make stuff happen and to handle complexity. If you can't understand what's happening you won't be able to make stuff happen and much less to spot when is complexity going to eat you.

[+] jacquesm|2 years ago|reply

> Coding is not the passion of my life, addmitedly, but I like it.

It may not be the passion of your life but I haven't seen anybody articulate better (in recent memory) what they want to get out of coding and how they evaluate their tools. Keep at it, don't change and you'll go places, you are definitely on the right path.

[+] withinboredom|2 years ago|reply

I think probably the best use of AI, so far, was when I went into a controller and told it to generate an openAPI spec ... and it got it nearly right. I only had to modify some of the models to reflect reality.

BUT (and this is key), I've hand-written so many API specs in my career that 1) I was able to spot the issues immediately, and 2) I could correct them without any further assistance (refining my prompt would have taken longer than simply fixing the models by hand).

For stuff where you know the domain quite well, it's amazing to watch something get done in 30s that you know would have taken you the entire morning. I get what you're saying though, I wouldn't consider asking the AI to do something I don't know how to do, though I do have many conversations with the AI about what I'm working on. Various things about trade-offs, potential security issues, etc. It's like having a junior engineer who has a PHD in how my language works. It doesn't understand much, but what it does understand, it appears to understand it deeply.

[+] mvdtnz|2 years ago|reply

> Another problem I have is that I find there's a common style of JS which consist in daisy-chaining a myriad of methods and anonymous functions, and I really struggle with this. I like to break stuff into lines, name my functions and variables, etc.

I think your whole comment is excellent but I just wanted to tell you, you're on the right track here. Certain developers, and in particular JS developers, love to chain things together for no benefit other than keeping it on one line. Which is no benefit at all. Keep doing what you're doing and don't let this moronic idiom infect your mind.

[+] kromem|2 years ago|reply

While I can't speak to Codeium, you might want to try Copilot in a more mature codebase that reflects your style of composition.

The amazing part for me with the tech is when it matches my style and preferences - naming things the way I want them, correctly using the method I just wrote in place of repeating itself, etc.

I haven't used it much in blank or small projects, but I'd imagine I'd find it much less ideal if it wasn't so strongly biased towards how I already write code given the surrounding context on which it draws.

[+] tpmoney|2 years ago|reply

The tool and design of the tool matters a lot. I've used Codeium in VSC and GH Copilot in Intellij, and the experience (and quality) of the GH + Intellij paring is much better than Codeium + VSC.

My biggest use for AI assistants has been speeding up test writing and any "this but slightly different" repetitive changes to a code base (which admittedly is also a lot of test writing). At least in intellij + GH, things like, a new parameter that now needs to be accounted for across multiple methods and files is usually a matter of "enter + tab" after I've manually typed out the first two or three variants of what I'm trying to do. Context gives it the rest.

In VSC with Codeium, the AI doesn't seem quite as up to snuff, and the plugin is written in such a way that its suggestions and the keys for accepting them seem to get in the way a lot. It's still helpful for repetitive stuff, but less so for providing a way of accomplishing a given goal.

[+] godzillabrennus|2 years ago|reply

I decided to use ChatGPT to build a clone of Yourls using Django/Python. I gave it specific instructions to not only allow for a custom shortened URL but to track the traffic. It didn’t properly contemplate how to do that in the logic or data model. I had to feed it specific instructions afterwards to get it fixed.

AI tools are akin to having a junior developer working for you. Except they are much much faster.

If you don’t know what you’re doing they just accelerate the pace that you make mistakes.

[+] simonw|2 years ago|reply

The full paper is here: https://gitclear-public.s3.us-west-2.amazonaws.com/Coding-on...

[+] mvdtnz|2 years ago|reply

There was already a backlash against DRY code occurring before "AI" assistants hit the market, sadly. It was a growing movement when I was using Twitter in 2019-2022.

Some younger developers have a very different attitude to code than what I was brought up with. They have immense disdain for the Gang of Four and their design patterns (probably without realising that their favourite frameworks are packed to the gills with these very patterns). They speak snidely about principles like DRY and especially SOLID. And on places like Twitter the more snide and contrarian you can be, the more engagement you'll get. Very disturbing stuff.

[+] alkonaut|2 years ago|reply

CoPilot is good a as a one line autocomplete. It's short enough that you can review the suggestion and decide whether or not to accept the autocompletion or type out your own completion.

For reasoning about larger chunks of code I find ChatGPT better than CoPilot as an LLM assistant. Trying to use CoPilot for making large sections of boilerplate like the kind you might see in a db->api->web project is just full of frustration. It doesn't realize it makes makes tiny inconsistencies everywhere so you are permanently babysitting. I think the key takeaway is that if you have repeated code (An entity, a DTO, a controller, a frontend component all sharing some set of names/properties) then its better to change jobs than change tools.

[+] freedomben|2 years ago|reply

Something I didn't see mentioned is also that over time this is going to feedback loop into the training set and compound (possibly exponentially). I.e. as more lower quality code hits github and is used for training, the output of the code will decline, which causes lower quality code to hit github, which causes output to further decline, etc.

I am an experienced dev but new to ML so take with a grain of salt, but I really wonder if the future is going to be quality in the training sets rather than quantity. I have heard that the "emergent properties" don't seem really affected by bad data as long as the set is large enough, but at what point does this cease to be true? How do we guard against this?

[+] darepublic|2 years ago|reply

Sometimes less dry code can actually be easier to read and understand at the point of usage than dry code that has been more highly abstracted and requires grokking a sort of quasi DSL that defines the abstraction. Assuming that AI contributions will only increase, if a codebase were almost completely written by AI perhaps the benefits of DRY would diminish vs on the spot readability by humans only trying to understand the code and not personally add to it

[+] chongli|2 years ago|reply

Well as with anything, DSLs are subject to the rules of good design. A really well-designed DSL (such as SQL) takes on a life of its own and becomes incredibly ubiquitous and useful to know in its own right. Many other DSLs are totally unknown, not worth learning, and serve as barriers to code understanding.

I don’t know of too many people who would advocate replacing SQL with hand-written C manipulating B-trees and hash tables. Similarly, it’s pretty rare that you want to hand-roll a parser in C over using a parser generator DSL or even something like regex.

[+] padjo|2 years ago|reply

The real news here is that the authors have apparently found an objective and universal measure of code quality.

[+] danpalmer|2 years ago|reply

This certainly fits with my experience and biases. When using Copilot, I felt that it tried to be too clever, and often got things wrong. It would try to write a function based on the name, and would go one of three ways: either the function was a trivial one-liner and it saved me ~2 seconds of typing after I figured out if it was correct, or it was a complex function where it cost me ~2 seconds to figure out it was waaaaay off the mark, or it saved me 30 seconds up-front by producing something that appeared correct, but that I found out 10 minutes of debugging later was actually subtly incorrect in a way that I couldn't spot when reading quickly, but likely wouldn't have written myself.

What I really want is a smarter intellisense, whereas Copilot is a dumber pair programmer. I want smart, style-aware, context-aware tab completion on boilerplate, not on business logic.

Unfortunately I think many people are using it for business logic and that seems to be the direction the product is going.

[+] adonese|2 years ago|reply

Have been using chatgpt/ copilot for a while now and I feel I know better the limitations and what it can achieve most: - unit testing and docs, readme - when it can potentially hallucinate

It helps me do things that I'd usually procrastinate from doing, yet I know how I can get them done. It is really a booster to ones performance if you know exactly what you want

[+] shadowgovt|2 years ago|reply

"'AI-generated code resembles an itinerant contributor, prone to violate the DRY-ness [don't repeat yourself] of the repos visited.' ... That serves as a counterpoint to findings of some other studies, including one from GitHub in 2022 that found, for one thing: "developers who used GitHub Copilot completed the task significantly faster -- 55 percent faster than the developers who didn't use GitHub Copilot."

These two observations aren't mutually exclusive. DRY-ness comes from a holistic understsanding of a system; you get similar results when you task a junior developer with little deep knowledge of a novel codebase with solving a problem and don't give them enough oversight; they'll tend to bull-rush through coding a solution with little-to-no knowledge of how the solution can be built out of the parts already existing in the codebase.

[+] rsynnott|2 years ago|reply

Shock horror, the thing everyone said would happen has happened. I mean, I’m not sure what people expected.

[+] userbinator|2 years ago|reply

Those who can't code better than an LLM will welcome it, those who can will abhor it. Unfortunately it seems there are far more of the former than the latter, and they aren't going to get better either.

Progress comes from human intellect, mediocrity comes from regurgitation.

[+] kvonhorn|2 years ago|reply

Quality is best thought of as a process, and that process got pushed out of the SDLC by Agile process and its metric of velocity. The use of LLM-generated code to further increase velocity in the absence of quality process is an obvious result.

322 comments