top | item 39460788

Tell HN: GPT copilots aren’t that great for programming

193 points| swman | 2 years ago | reply

For context, I'm a experienced SWE working on some fairly complex things. I've been using programming GPT copilots for 6-8 months now and lately I've been using them less and less.

I think for complete beginners or casual programmers, GPT might be mind-blowing and cool because it can create a for loop or recommend some solution to a common problem.

However, for most of my tasks, it usually has ended up being a total waste of time and leads to frustration. Don't get me wrong, it is useful for those basic tasks for which in the past I'd do the google -> Stack Overflow route. However, for anything more complex, it falls flat.

Just a recent example from last week - I was working on some dynamic SQL generation. To be fair, it was a really complex task and it was 5pm so I didn't feel like whiteboarding (when in doubt, always whiteboard and skip gpt lol). I thought I'd turn to GPT and ended up wasting 30 minutes while it kept hallucinating and giving me code that isn't even valid. It skipped some of the requirements and missed things like a GROUP BY which made the generated query not even work. When I told it that it missed this, it regenerated some totally different code that had other issues.. and I stopped.

When chatGPT first came out I was using it all the time. Within a couple weeks though it became obvious it is really limited.

I thought I'd wait a few months and maybe it would get better but it hasn't. Who exactly are copilots for if not beginners? I really don't find them that useful for programming because 80% of the time the solutions are a miss or worse don't even compile or work.

I enjoy using it to write sci fi stories or learn more about some history stuff where it just repeats something it parsed off wikipedia or whatever. For anything serious, I find I don't get that much use out of it. I'm considering canceling my subscription because I think I'll be okay using 3.5 or whatever basic model I'd get.

Am I alone here? Sorry for the ramble. I just feel like I had to put it out there.

135 comments

[+] whalesalad|2 years ago|reply

I think of them as a more intelligent autocomplete. I don’t lean on them too heavily but I find that they make my life 5% easier by autocompleting based on style and known names of things versus relying wholly on the LSP. (Copilot)

On the GPT-4 side I’ve had great luck with dealing with complex SQL/BigQuery queries. I will explain a problem, offer my schema or a psql trigger and my goals on how to augment it and it’s basically spot on every time. Helps me when I know what I want to do but don’t know precisely how to achieve it.

[+] nextos|2 years ago|reply

I think the real power will come when generative models get combined with e.g. refinement types, which are more or less analogous to contracts. Imagine, you decompose a problem into some functions with some contracts, and you get implementations for free. Plus, they will be guaranteed to match the specification!

I was discussing this with GitHub when they were hiring for Copilot, but understandably they wanted to get the basic functionality right first. I think it is the next step, and a very interesting topic for a startup or OpenAI et al. to tackle. It has the potential to make programming both more robust and faster, possibly bringing us closer to the correctness levels of classical engineering disciplines.

[+] mtriassi|2 years ago|reply

This feels more or less in line with what my team has found, we gave everyone a copilot seat in our Github org, and anecdotally speaking it feels like we've seen roughly a 5-10% increase in productivity. This is of course self reported and not measured against any metrics. Assuming we're right about that, its an easy sell when we account for what our internal hourly rate is.

we also found the same as the OP, It's good for simple problems or boilerplate, not great for more complex problems.

[+] yesguidance|2 years ago|reply

Me too. I'm writing a compiler for fun and its extremely helpful to have it auto complete entire simple functions, like converting AST nodes to a string representation for example.

[+] penjelly|2 years ago|reply

i find the inline suggestions of copilot distracting in any but the most mundane of cases. I know you can disable them but it still feels like it should be enabled instead of hotkeyed

[+] swasheck|2 years ago|reply

a more intuitive search as well. i am going to fly to paris in april and wanted to know what time it would be in denver when i landed. chatgpt was better-suited to the task than a multi-step duckbingle search

[+] thunkshift1|2 years ago|reply

Is it worth paying 20$ for 5% improvement

[+] jvanderbot|2 years ago|reply

Yeah I think this captures my experience well. Not so much copilot as administrative assistant for my editor.

[+] lolinder|2 years ago|reply

I've stopped trying to use GPTs for complex tasks like what you describe, but I find them to be invaluable for getting a lot of grunt work done on my hobby projects.

As a concrete example: GitHub Copilot has been absolutely life-changing for working on hobby programming language projects. Building a parser by hand consists of writing many small, repetitive functions that use a tiny library of helper functions to recursively process tokens. A lot of people end up leaning on parser generators, but I've never found one that isn't both bloated and bad at error handling.

This is where GitHub Copilot comes in—I write the grammar out in a markdown file that I keep open, build the AST data structure, then write the first few rules to give Copilot a bit of context on how I want to use the helper functions I built. From there I can just name functions and run Copilot and it fills in the rest of the parser.

This is just one example of the kind of task that I find GPTs to be very good at—tasks that necessarily have a lot of repetition but don't have a lot of opportunities for abstraction. Another one that is perhaps more common is unit testing—after giving Copilot one example to go off of, it can generate subsequent unit tests just from the name of the test.

Is it essential? No. But it sure saves a lot of typing, and is actually less likely than I am to make a silly mistake in these repetitive cases.

[+] gs17|2 years ago|reply

Agreed, GitHub Copilot is fantastic if you give it something to work off of and outsource repetitive tasks to it. You still need to babysit it a little bit (it made a very subtle bug once that took a while to figure out), but it does a great job of generating code based on the other things you've been doing. It's a great little assistant to make coding less tedious.

[+] kromem|2 years ago|reply

Threads like these often feel like reading a lot of "Does anyone else feel like your Phillips screwdriver just isn't very good at hammering in nails? It kind of works, but I'd rather not use it for that."

I really have zero desire to ever be programming without Copilot again, and have been writing software for over a decade.

It just saves so much time on doing all the boring stuff I wanted to tear my hair out on wondering why I even bother doing this line of work at all.

Yeah, you're right, it's not as good as I am at the complex more abstracted things I actually enjoy solving.

So it does the grunt work like writing the I/O calls, logging statements, and the plethora of "nearly copy/paste but just different enough you need to write things out" parts of the code. And then I do the review of what it wrote and the key parts that I'm not going to entrust to a confabulation engine.

My favorite use is writing unit tests, where if my code is in another tab (and ideally a reference test file from the same package) it gets about 50% there with a full unit test and suddenly my work is no longer writing the boilerplate but just narrowing in the tailored scaffolding to what I actually want it to test.

It's not there to do your job, it's there to make it easier in very specific ways. Asking your screwdriver to be the entire toolbox is always going to be a bad time.

[+] gnatman|2 years ago|reply

I'm a salesman, not a SWE. I use GPT-4 to write simple scripts and excel macros that would otherwise be impractical for me to figure out manually. At first I was trying to use GPT-4 to write emails, but I'm very picky on tonality and was rarely satisfied with the results.

[+] sumeruchat|2 years ago|reply

Yeah i am gonna even judge anyone who says copilot does not make them productive. Like what code could you possibly be writing that copilot is not autocompleting you properly? Yeah if you dont know what to write then copilot cant auto complete you.

[+] pmx|2 years ago|reply

100% agree with you. Copilot does the really boring stuff for me so I can work on actually solving the interesting problems. Copilot was down for a few hours the other week and I was so sick of typing out all the boilerplate stuff it usually does for me that I moved onto non-coding tasks until it was back online.

[+] _fs|2 years ago|reply

I believe the mistake people use with copilot is that they attempt to write large projects or functions when they lack the knowledge base of the underlying technology.

I prefer to use it as more of an autocomplete on a line per line basis when writing new code.

Typically, I use it for small and concise chunks of code that I already fully understand, but save me time. Things like "Here's 30 lines of text, give me a regex that will match them all" or "Unroll/rewrite this loop utilizing bit shifting".

I also use copilot as a teacher. Like to quickly grok assembly code or code in languages that I do not use everyday. Or having a back and forth conversation with copilot chat on a specific technology I want to use and don't fully understand. Copilot chat makes an excellent rubber duck when working through issues.

[+] kromem|2 years ago|reply

It's pretty amazing how much of the boring parts of programming can be abstracted away with intelligent temporary comments and enough neighboring context using Copilot.

Maybe they need to do a better job at teaching users how to be productive with the tool.

[+] ratg13|2 years ago|reply

He’s not using copilot though, and I think that is part of the problem.

Copilot gives me what I need to scaffold everything I am building.

Asking ChatGPT questions is good for kicking around ideas, but little more.

[+] CPLX|2 years ago|reply

Having a conversation with it is kind of amazing, perhaps underrated.

A very long series of questions can totally brief you on tech you don’t understand or have a base in.

[+] daymanstep|2 years ago|reply

I was writing a sudoku solver today and there was a bug that took me a while to track down (can't remember exactly how long - could be a few minutes to a couple of hours).

I asked ChatGPT to find the bug and it didn't find it. I also asked GPT4-Turbo to find the bug and it also couldn't find it. In the end I found the bug manually using tracing prints.

After I found the bug, I wondered if GPT4 could have found it so I gave the buggy code to GPT4 and it found the line with the bug instantly.

To me this shows that GPT4 is much better than GPT4-Turbo and GPT-3.5

[+] 99catmaster|2 years ago|reply

What’s the difference between GPT4 and GPT4-Turbo?

[+] allears|2 years ago|reply

It's like the story of the amazing singing dog. Not amazing because he sings well (he doesn't) but because he sings at all.

[+] xpl|2 years ago|reply

> Who exactly are copilots for if not beginners?

The thing is: in software engineering, you're very often "a beginner" when using new technology or operating outside your familiar domain. In fact, you need to learn constantly just to stay in the business.

[+] d12345m|2 years ago|reply

This is an important point.

I’m not a beginner per se - I started writing Objective-C and Python more than a decade ago and I’ve written a depressingly large amount of SQL in that same period. But when my current employer decided I was going to be a web developer, I needed to start from the ground up with Django.

Copilot has been a godsend for me. I still need books and Stack Overflow, but the conversations I’ve had with Copilot about architectural decisions, project structure, external library choices, syntax, etc., has saved me a ton of time that I would have otherwise spent reading ad-riddled Medium articles to learn.

As a not-beginner beginner, it’s been a huge productivity boost for me.

Agree with op though that it’s pretty bad with SQL. Other than reminders about basic syntax, conversions from T-SQL to Oracle SQL syntax, or mindless column aliasing, I don’t bother much with it.

[+] wruza|2 years ago|reply

This is the essence of my experience with LLMs. I don't need their help to walk my talks. But they help me immensely with skipping phases like "let's climb this curve for 30min to create some config and forget most of what you did before, then load it back and forget what you learned". They broaden your knowledge, not deepen it. It's a vague-memories extension with everything in it.

[+] diegoop|2 years ago|reply

In my experience:

- For basic autocompletion is ok, on lazy days I even find myself often thinking "why the ai is not proposing a solution to this stupid method yet?".

- For complicated coding stuff is worthless, I've lost a lot of time trying to fix some ai generated code to end up writing the stuff again, so I rely on google/stackoverflow for that kind of research.

- For architectural solutions or some research like looking for the right tool to do something I found it quite useful as it often present options I didn't consider or didn't know in the first place, so I can take them also in consideration.

[+] thot_experiment|2 years ago|reply

I get a tremendous amount of value from ChatGPT, like you said for things where I would previously have to google -> stack overflow it's incredibly useful. It works as a insanely good search/autocomplete and that is worth a ton. I love being able to sketch a function with an example input/output and have it return something correct, or at least close 95% of the time. As an experienced dev it's easy for me to look at it and get to 100%.

It's also so helpful to be able to just ask questions of the documentation on popular projects, whether it be some nuance of the node APIs or a C websockets library, it saves me countless hours of searching and reading through documentation. Just being able to describe what I want and have it suggest some functions to paste into the actual documentation search bar is invaluable.

Similarly I find it's really helpful when trying to prototype things, the other day I needed to drop an image into a canvas. I don't remember off top exactly how to get a blob out of an .ondrop (or whatever the actual handler is) and I could find it with a couple minutes of google and MDN/SO, but if I ask ChatGPT "write me a minimal example for loading a dropped image into a canvas" I get the exact thing I want in 10 seconds and I can just copy paste the relevant stuff into MDN if I need to understand how the actual API works.

I think you're just using it wrong, and moreover I think it's MUCH MUCH more useful as an experienced engineer than as a beginner. I think I get way more mileage out of it than some of my more junior friends/colleagues because I have a better grasp on what questions to ask, and I can spot it being incorrect more readily. It feels BAD to be honest, like it's further stratifying the space by giving me a tool that puts a huge multiplier on my experience allowing me to work much faster than before and leaving those who are less experienced even further behind. I fear that those entering the space now, working with ChatGPT will learn less of the fundamentals that allow me to leverage it so effectively, and their growth will be slowed.

That's not to say it can't be an incredibly powerful learning tool for someone dedicated to that goal, but I have some fear that it will result in less learning "through osmosis" because junior devs won't be forced into as much of the same problem solving I had to do to be good enough, and perhaps this will allow them to coast longer in mediocrity?

[+] EchoChamberMan|2 years ago|reply

"I worry our Copilot is leaving some passengers behind" https://news.ycombinator.com/item?id=39411912

[+] ado__dev|2 years ago|reply

I have been using Cody (sourcegraph.com/cody) for about 6 months now and it's completely changed the way I write code. But, there was an adjustment period to learn how to work with the tool. Expecting a code copilot to just give you working code 100% of the time is unrealistic today, we may get there eventually though.

I've been writing code for close to 20 years now across the full stack, I have written a lot of bad code in my life, I have seen frameworks come and go, so spotting bad code or spotting bad practices is almost second nature to me. With that said, using Cody, I'm able to ship much faster. It will sometimes return bad answers, i may need to tweak my question, and sometimes it just doesn't capture the right context for what I'm trying to do, but overall it's been a great help and I'd say has made me 35-40% more efficient.

(Disclaimer: I work for Sourcegraph)

[+] MattGaiser|2 years ago|reply

A lot depends on what you use them for.

I don’t find them that great at large scale programming and they couldn’t do the hard parts of my work, but a lot of what I do doesn’t need to be “great.”

There’s the core system design and delivering of features. That it struggles with. Anything large seems to be a struggle.

But generating SQL for a report I do sporadically on demand from another team?

Telling me what to debug to get Docker working (which I am rarely doing as a dev)? Anything Shell or Nginx related (again, infrequent, so I am a beginner in those areas)

Generating infrequently run but tedious formatting helper functions?

Generating tests?

Basically, what would you give a dev with a year of experience? I would take ChatGPT/Copilot over me with 1 year of experience.

The biggest benefit to me is all the offloaded non-core work. My job at least involves a lot more than writing big features (maybe yours does not).

[+] CPLX|2 years ago|reply

It’s incredible for my use case.

I have been involved in software and implementing technical things since the late 90s and from time to time have been pretty good at a few things here and there but I am profoundly rusty in all languages I sort of know and useless in ones I don’t.

But I’m technical. I understand at sort of a core level how things work, jargon, and like the key elements of data structures and object oriented code and a MVC model and whatever else. Like I’ve read the right books.

Without ChatGPT I am close to useless. I’m better off writing a user story and hiring someone, anyone. Yes I can code in rails and know SQL and am actually pretty handy on the command line but like it would take me an entire day and tons of googling to get basic things working.

Then they launched GPT and I can now launch useful working projects that solve business problems quickly. I can patch together an API integration on a Sunday afternoon to populate a table I already have in a few minutes. I can take a website I’m overseeing and add a quick feature.

It’s literally life changing. I already have all the business logic in my head, and I know enough to see what GPT is spitting out and if it’s wrong and know how to ask the right questions.

Unlike the OP I have no plans to do anything complex. But for my use cases it’s turned me from a project manager into a quick and competent developer and that’s literally miraculous from where I’m standing.

[+] ryzvonusef|2 years ago|reply

I'm the beginner you are talking about, I have mixed feelings about coding with AI.

I'm not a programmer, I'm a student in acc/fin, to use a weird analogy, if you are a chef, I'm a stereotypical housewife, and we think differently about knives (or GPTs).

I differentiate between tuples, lists and dictionaries not by the definition, but by the type of brackets they use in Python. I use Python because it's the easiest and most popular tool, and I use Phind and other GPT tools because programming is just a magic spell for me to get to what I want, and the less effort I have to spend the better.

But it doesn't mean that GPTs don't bring their own headaches too. As I get more proficient, I now realise that GPTs are now giving me bad or inefficient advice.

I can ask a database related question and then realise, hang on, despite me specifying this is for Google BigQuery, it's giving me an answer that involves some function I know is not available on it. Or I read the the code it recommends for pandas and realise, hang on, I could combine these two lines into one.

I still use GPT heavily because I don't have time to think about code structure, I just need the magic words to put into the Jupyter cell, so I can get on with my day.

But you don't, and you actually think about these things, and you are realising the gaping flaws in the knife's structure. That's life. YOu have a skill and there comes pros and cons with it.

Like a movie reviewer who can no longer just go to the cinema and enjoy something for the sake of it... you also can't just accept some code from a GPT and just use it, you can't help not analyse it.

[+] vault|2 years ago|reply

GPT-4 was addictive for me. I subscribed to replace online language classes and it was an excellent instructor. After passing the exam I unsubscribed and I'm living fine with 3.5. My screen time definitely lowered :) I bought $5 of GPT-4 API credits to use when 3.5 really can't do the job, but it rarely happens. Asking MS Copilot is also another great way to use GPT-4 for free. On the job I mostly use GH Copilot for code completion and it's great as it provides suggestions that are in line with the code style of my team. On serious tasks all chat bots allucinate and I also feel I'm spending as much time correcting them as if I studied the topic from scratch, because I (want to?) believe what they say but end up wasting a lot of time fixing their suggestions. I'm also thinking about SQL, today it suggested me an `UPDATE table JOIN another_table SET column` and I was surprised I could use JOINs in update statements, but the bot was so sure it was the right keyword. I tried to understand where _else_ the syntax error could be, until I turned back to postgres official documentation and verified there's no JOIN, only FROM, just like I remembered.

[+] electric_mayhem|2 years ago|reply

Between your post and Air Canada’s learning they have to honor policies their chat bot hallucinates and relays to customers, it seems like the zeitgeist is starting to comprehend the inherent limitations and risks of LLMs.

I find that kind of heartening, honestly.

But it’s by no means a death sentence for AI. Plenty of dimensions for massive improvement.

[+] MattGaiser|2 years ago|reply

It wouldn’t surprise me if ChatGPT is an improvement in Air Canada customer service anyway in terms of information provided.

It is just that the bot in this case wrote it down, which made AC liable.

I’m an Air Canada elite and am part of several Facebook groups of similar people. It is notoriously difficult to get clear information on Air Canada policies for anything. Even concierge (for Air Canada’s top tier loyalty members) staff are often giving contradictory information.

Their rules for everything are extremely complicated and they have a fairly large back office constantly fixing even addition errors in terms of points allocation and status progression. They literally aren’t adding up spend totals correctly.

It is quite possible that Air Canada just didn’t tell the bot anything about bereavement fares.

[+] ohthatsnotright|2 years ago|reply

Now if only the C*O class would learn that they can help but certainly won't replace developers. Then we can get back to a more normal hiring market. This downturn in tech is very bizarre and much worse than the dotcom bubble.

[+] calgoo|2 years ago|reply

While I’m more on the infrastructure side of things, I see a similar issue. Like you mentioned, it’s great for lookups of API documentation and getting examples etc. I have also used it for things like templates and boring boilerplate. I have come to look at it as a lookup tool and something that converts my thoughts into code. I could see myself sitting at home and doing a lot of coding by voice and a vr headset in the future if the tools continue to develop. At the moment I think we just need to come up with a better way of integrating it into our workflow. I’m starting to wonder if something like visual programming could work well with the “ai” “figuring” out the content of the blocks we connect and basically lets us influence the generated code by the io. That could be a solution to coding on tablets and phones with minimum typing.

[+] schmookeeg|2 years ago|reply

Not alone. I was pretty grouchy about it a few months ago. It seems to be getting better, though.

I code all over the stack, usually some bizarre mix of python, pyspark, SQL, and typescript.

TS support seems pretty nice, and it can optimize and suggest pretty advanced things accurately.

Py was hopeless a few months ago, but my last few attempts have been decent. I've been sent down some rabbitholes though, and been burned -- usually my not paying attention and being a lazy coder.

PySpark is just the basics, which is fine if I am distracted and just want to do some basic EMR work. More likely, though, I'll rummage my own code snippets instead.

The speed of improvement has been impressive. I'm getting enthused about this stuff more and more. :)

Plus, who doesn't enjoy making random goofy stuff in Dall-E while waiting for some progressbar to advance? That alone is worth the time investment for me.

[+] nicklecompte|2 years ago|reply

My personal reason for not using GitHub Copilot / etc:

I was testing ChatGPT-3.5 with F# in 2023 and saw some really strange errors. Turns out it was shamelessly copying from GitHub repos that had vaguely related code to what I was asking - this was easy to discover because there's not much F# out there. In fact the relative sparsity of F# is precisely why GPT-3.5 had to plagiarize! It did not take long to find a prompt that spat out ~300 lines verbatim from my own F# numerics library. (I believe this problem is even worse for C numeric programmers, whose code and expertise is much more valuable than anything in .NET.) OpenAI's products are simply unethical, and I am tired of this motivated reasoning which pretends automated plagiarism is a-okay as long as you personally find it convenient.

But even outside of plagiarism I am really nervous about the future of software development with LLMs. So often I see people throwing around stats like "we saw a 10% increase in productivity" without even mentioning code quality. There are some early indications that productivity gains in LLM code assistance are paid for by more bugs and security holes - nothing that seems catastrophic, but hardly worth dismissing entirely. What is frustrating is that this was easily predictable, yet GitHub/OpenAI rushed to market with a code generation product whose reliability (and legality) remains completely unresolved.

The ultimate issue is not about AI or programming so much as software-as-industrial-product. You can quickly estimate increases in productivity over the course of a sprint or two: it's easy to count features cleared and LoC written. But if there are dumb GPT "brain fart" errors in that boilerplate and the boilerplate isn't adequately reviewed by humans, then you might not have particularly good visibility of the consequences until a few months pass and there seem to be more 5-10% bug reports than usual. Again, I don't think the use of Copilot is actually a terrible security disaster. But it's clearly a risk. It's a risk that needs to be addressed BEFORE the tool becomes a de facto standard.

I certainly get that there's a lot of truly tedious boilerplate in most enterprise codebases - even so I suspect a lot of that is better done with a fairly simple deterministic script versus Copilot. In fact my third biggest irritation with this stuff is that deterministic code generation tools have gotten really good at producing verifiably correct code, even if the interface doesn't involve literally talking to a computer.

[+] timrobinson333|2 years ago|reply

I have been programming for over 40 years, mostly in fairly verbose languages but nowadays mostly in JavaScript and clojure, both of which can be very concise.

I find I spend most of my time thinking about the problem domain and how to model it in logic, and very little time just banging out boilerplate code. When I want to do the kind of task a lot of people will ask gpt for, I find it's often built into the language or available as an existing library - with experience you realise that the problem you're trying to solve is an instance of a general problem that has already been solved.

[+] tdudhhu|2 years ago|reply

You are not alone.

At the core AI/ML is giving you answers that have a high probability of being good answers. But in the end this probability is based on avarages. And the moment you are coding stuff that is not avarage AI does not work anymore because it can not reason about the question and 'answer'.

You can also see this in AI generated images. They look great but the avarage component makes them all look the same and a kind of blurry.

For me the biggest danger of AI is that people put too much trust in it.

It can be a great tool, but you should not trust it to be the truth.