Tell HN: GPT copilots aren’t that great for programming
I think for complete beginners or casual programmers, GPT might be mind-blowing and cool because it can create a for loop or recommend some solution to a common problem.
However, for most of my tasks, it usually has ended up being a total waste of time and leads to frustration. Don't get me wrong, it is useful for those basic tasks for which in the past I'd do the google -> Stack Overflow route. However, for anything more complex, it falls flat.
Just a recent example from last week - I was working on some dynamic SQL generation. To be fair, it was a really complex task and it was 5pm so I didn't feel like whiteboarding (when in doubt, always whiteboard and skip gpt lol). I thought I'd turn to GPT and ended up wasting 30 minutes while it kept hallucinating and giving me code that isn't even valid. It skipped some of the requirements and missed things like a GROUP BY which made the generated query not even work. When I told it that it missed this, it regenerated some totally different code that had other issues.. and I stopped.
When chatGPT first came out I was using it all the time. Within a couple weeks though it became obvious it is really limited.
I thought I'd wait a few months and maybe it would get better but it hasn't. Who exactly are copilots for if not beginners? I really don't find them that useful for programming because 80% of the time the solutions are a miss or worse don't even compile or work.
I enjoy using it to write sci fi stories or learn more about some history stuff where it just repeats something it parsed off wikipedia or whatever. For anything serious, I find I don't get that much use out of it. I'm considering canceling my subscription because I think I'll be okay using 3.5 or whatever basic model I'd get.
Am I alone here? Sorry for the ramble. I just feel like I had to put it out there.
[+] [-] whalesalad|2 years ago|reply
On the GPT-4 side I’ve had great luck with dealing with complex SQL/BigQuery queries. I will explain a problem, offer my schema or a psql trigger and my goals on how to augment it and it’s basically spot on every time. Helps me when I know what I want to do but don’t know precisely how to achieve it.
[+] [-] nextos|2 years ago|reply
I was discussing this with GitHub when they were hiring for Copilot, but understandably they wanted to get the basic functionality right first. I think it is the next step, and a very interesting topic for a startup or OpenAI et al. to tackle. It has the potential to make programming both more robust and faster, possibly bringing us closer to the correctness levels of classical engineering disciplines.
[+] [-] mtriassi|2 years ago|reply
we also found the same as the OP, It's good for simple problems or boilerplate, not great for more complex problems.
[+] [-] yesguidance|2 years ago|reply
[+] [-] penjelly|2 years ago|reply
[+] [-] swasheck|2 years ago|reply
[+] [-] thunkshift1|2 years ago|reply
[+] [-] jvanderbot|2 years ago|reply
[+] [-] lolinder|2 years ago|reply
As a concrete example: GitHub Copilot has been absolutely life-changing for working on hobby programming language projects. Building a parser by hand consists of writing many small, repetitive functions that use a tiny library of helper functions to recursively process tokens. A lot of people end up leaning on parser generators, but I've never found one that isn't both bloated and bad at error handling.
This is where GitHub Copilot comes in—I write the grammar out in a markdown file that I keep open, build the AST data structure, then write the first few rules to give Copilot a bit of context on how I want to use the helper functions I built. From there I can just name functions and run Copilot and it fills in the rest of the parser.
This is just one example of the kind of task that I find GPTs to be very good at—tasks that necessarily have a lot of repetition but don't have a lot of opportunities for abstraction. Another one that is perhaps more common is unit testing—after giving Copilot one example to go off of, it can generate subsequent unit tests just from the name of the test.
Is it essential? No. But it sure saves a lot of typing, and is actually less likely than I am to make a silly mistake in these repetitive cases.
[+] [-] gs17|2 years ago|reply
[+] [-] kromem|2 years ago|reply
I really have zero desire to ever be programming without Copilot again, and have been writing software for over a decade.
It just saves so much time on doing all the boring stuff I wanted to tear my hair out on wondering why I even bother doing this line of work at all.
Yeah, you're right, it's not as good as I am at the complex more abstracted things I actually enjoy solving.
So it does the grunt work like writing the I/O calls, logging statements, and the plethora of "nearly copy/paste but just different enough you need to write things out" parts of the code. And then I do the review of what it wrote and the key parts that I'm not going to entrust to a confabulation engine.
My favorite use is writing unit tests, where if my code is in another tab (and ideally a reference test file from the same package) it gets about 50% there with a full unit test and suddenly my work is no longer writing the boilerplate but just narrowing in the tailored scaffolding to what I actually want it to test.
It's not there to do your job, it's there to make it easier in very specific ways. Asking your screwdriver to be the entire toolbox is always going to be a bad time.
[+] [-] gnatman|2 years ago|reply
[+] [-] sumeruchat|2 years ago|reply
[+] [-] pmx|2 years ago|reply
[+] [-] _fs|2 years ago|reply
I prefer to use it as more of an autocomplete on a line per line basis when writing new code.
Typically, I use it for small and concise chunks of code that I already fully understand, but save me time. Things like "Here's 30 lines of text, give me a regex that will match them all" or "Unroll/rewrite this loop utilizing bit shifting".
I also use copilot as a teacher. Like to quickly grok assembly code or code in languages that I do not use everyday. Or having a back and forth conversation with copilot chat on a specific technology I want to use and don't fully understand. Copilot chat makes an excellent rubber duck when working through issues.
[+] [-] kromem|2 years ago|reply
Maybe they need to do a better job at teaching users how to be productive with the tool.
[+] [-] ratg13|2 years ago|reply
Copilot gives me what I need to scaffold everything I am building.
Asking ChatGPT questions is good for kicking around ideas, but little more.
[+] [-] CPLX|2 years ago|reply
A very long series of questions can totally brief you on tech you don’t understand or have a base in.
[+] [-] daymanstep|2 years ago|reply
I asked ChatGPT to find the bug and it didn't find it. I also asked GPT4-Turbo to find the bug and it also couldn't find it. In the end I found the bug manually using tracing prints.
After I found the bug, I wondered if GPT4 could have found it so I gave the buggy code to GPT4 and it found the line with the bug instantly.
To me this shows that GPT4 is much better than GPT4-Turbo and GPT-3.5
[+] [-] 99catmaster|2 years ago|reply
[+] [-] allears|2 years ago|reply
[+] [-] xpl|2 years ago|reply
The thing is: in software engineering, you're very often "a beginner" when using new technology or operating outside your familiar domain. In fact, you need to learn constantly just to stay in the business.
[+] [-] d12345m|2 years ago|reply
I’m not a beginner per se - I started writing Objective-C and Python more than a decade ago and I’ve written a depressingly large amount of SQL in that same period. But when my current employer decided I was going to be a web developer, I needed to start from the ground up with Django.
Copilot has been a godsend for me. I still need books and Stack Overflow, but the conversations I’ve had with Copilot about architectural decisions, project structure, external library choices, syntax, etc., has saved me a ton of time that I would have otherwise spent reading ad-riddled Medium articles to learn.
As a not-beginner beginner, it’s been a huge productivity boost for me.
Agree with op though that it’s pretty bad with SQL. Other than reminders about basic syntax, conversions from T-SQL to Oracle SQL syntax, or mindless column aliasing, I don’t bother much with it.
[+] [-] wruza|2 years ago|reply
[+] [-] diegoop|2 years ago|reply
- For basic autocompletion is ok, on lazy days I even find myself often thinking "why the ai is not proposing a solution to this stupid method yet?".
- For complicated coding stuff is worthless, I've lost a lot of time trying to fix some ai generated code to end up writing the stuff again, so I rely on google/stackoverflow for that kind of research.
- For architectural solutions or some research like looking for the right tool to do something I found it quite useful as it often present options I didn't consider or didn't know in the first place, so I can take them also in consideration.
[+] [-] thot_experiment|2 years ago|reply
It's also so helpful to be able to just ask questions of the documentation on popular projects, whether it be some nuance of the node APIs or a C websockets library, it saves me countless hours of searching and reading through documentation. Just being able to describe what I want and have it suggest some functions to paste into the actual documentation search bar is invaluable.
Similarly I find it's really helpful when trying to prototype things, the other day I needed to drop an image into a canvas. I don't remember off top exactly how to get a blob out of an .ondrop (or whatever the actual handler is) and I could find it with a couple minutes of google and MDN/SO, but if I ask ChatGPT "write me a minimal example for loading a dropped image into a canvas" I get the exact thing I want in 10 seconds and I can just copy paste the relevant stuff into MDN if I need to understand how the actual API works.
I think you're just using it wrong, and moreover I think it's MUCH MUCH more useful as an experienced engineer than as a beginner. I think I get way more mileage out of it than some of my more junior friends/colleagues because I have a better grasp on what questions to ask, and I can spot it being incorrect more readily. It feels BAD to be honest, like it's further stratifying the space by giving me a tool that puts a huge multiplier on my experience allowing me to work much faster than before and leaving those who are less experienced even further behind. I fear that those entering the space now, working with ChatGPT will learn less of the fundamentals that allow me to leverage it so effectively, and their growth will be slowed.
That's not to say it can't be an incredibly powerful learning tool for someone dedicated to that goal, but I have some fear that it will result in less learning "through osmosis" because junior devs won't be forced into as much of the same problem solving I had to do to be good enough, and perhaps this will allow them to coast longer in mediocrity?
[+] [-] EchoChamberMan|2 years ago|reply
[+] [-] ado__dev|2 years ago|reply
I've been writing code for close to 20 years now across the full stack, I have written a lot of bad code in my life, I have seen frameworks come and go, so spotting bad code or spotting bad practices is almost second nature to me. With that said, using Cody, I'm able to ship much faster. It will sometimes return bad answers, i may need to tweak my question, and sometimes it just doesn't capture the right context for what I'm trying to do, but overall it's been a great help and I'd say has made me 35-40% more efficient.
(Disclaimer: I work for Sourcegraph)
[+] [-] MattGaiser|2 years ago|reply
I don’t find them that great at large scale programming and they couldn’t do the hard parts of my work, but a lot of what I do doesn’t need to be “great.”
There’s the core system design and delivering of features. That it struggles with. Anything large seems to be a struggle.
But generating SQL for a report I do sporadically on demand from another team?
Telling me what to debug to get Docker working (which I am rarely doing as a dev)? Anything Shell or Nginx related (again, infrequent, so I am a beginner in those areas)
Generating infrequently run but tedious formatting helper functions?
Generating tests?
Basically, what would you give a dev with a year of experience? I would take ChatGPT/Copilot over me with 1 year of experience.
The biggest benefit to me is all the offloaded non-core work. My job at least involves a lot more than writing big features (maybe yours does not).
[+] [-] CPLX|2 years ago|reply
I have been involved in software and implementing technical things since the late 90s and from time to time have been pretty good at a few things here and there but I am profoundly rusty in all languages I sort of know and useless in ones I don’t.
But I’m technical. I understand at sort of a core level how things work, jargon, and like the key elements of data structures and object oriented code and a MVC model and whatever else. Like I’ve read the right books.
Without ChatGPT I am close to useless. I’m better off writing a user story and hiring someone, anyone. Yes I can code in rails and know SQL and am actually pretty handy on the command line but like it would take me an entire day and tons of googling to get basic things working.
Then they launched GPT and I can now launch useful working projects that solve business problems quickly. I can patch together an API integration on a Sunday afternoon to populate a table I already have in a few minutes. I can take a website I’m overseeing and add a quick feature.
It’s literally life changing. I already have all the business logic in my head, and I know enough to see what GPT is spitting out and if it’s wrong and know how to ask the right questions.
Unlike the OP I have no plans to do anything complex. But for my use cases it’s turned me from a project manager into a quick and competent developer and that’s literally miraculous from where I’m standing.
[+] [-] ryzvonusef|2 years ago|reply
I'm not a programmer, I'm a student in acc/fin, to use a weird analogy, if you are a chef, I'm a stereotypical housewife, and we think differently about knives (or GPTs).
I differentiate between tuples, lists and dictionaries not by the definition, but by the type of brackets they use in Python. I use Python because it's the easiest and most popular tool, and I use Phind and other GPT tools because programming is just a magic spell for me to get to what I want, and the less effort I have to spend the better.
But it doesn't mean that GPTs don't bring their own headaches too. As I get more proficient, I now realise that GPTs are now giving me bad or inefficient advice.
I can ask a database related question and then realise, hang on, despite me specifying this is for Google BigQuery, it's giving me an answer that involves some function I know is not available on it. Or I read the the code it recommends for pandas and realise, hang on, I could combine these two lines into one.
I still use GPT heavily because I don't have time to think about code structure, I just need the magic words to put into the Jupyter cell, so I can get on with my day.
But you don't, and you actually think about these things, and you are realising the gaping flaws in the knife's structure. That's life. YOu have a skill and there comes pros and cons with it.
Like a movie reviewer who can no longer just go to the cinema and enjoy something for the sake of it... you also can't just accept some code from a GPT and just use it, you can't help not analyse it.
[+] [-] vault|2 years ago|reply
[+] [-] electric_mayhem|2 years ago|reply
I find that kind of heartening, honestly.
But it’s by no means a death sentence for AI. Plenty of dimensions for massive improvement.
[+] [-] MattGaiser|2 years ago|reply
It is just that the bot in this case wrote it down, which made AC liable.
I’m an Air Canada elite and am part of several Facebook groups of similar people. It is notoriously difficult to get clear information on Air Canada policies for anything. Even concierge (for Air Canada’s top tier loyalty members) staff are often giving contradictory information.
Their rules for everything are extremely complicated and they have a fairly large back office constantly fixing even addition errors in terms of points allocation and status progression. They literally aren’t adding up spend totals correctly.
It is quite possible that Air Canada just didn’t tell the bot anything about bereavement fares.
[+] [-] ohthatsnotright|2 years ago|reply
[+] [-] calgoo|2 years ago|reply
[+] [-] schmookeeg|2 years ago|reply
I code all over the stack, usually some bizarre mix of python, pyspark, SQL, and typescript.
TS support seems pretty nice, and it can optimize and suggest pretty advanced things accurately.
Py was hopeless a few months ago, but my last few attempts have been decent. I've been sent down some rabbitholes though, and been burned -- usually my not paying attention and being a lazy coder.
PySpark is just the basics, which is fine if I am distracted and just want to do some basic EMR work. More likely, though, I'll rummage my own code snippets instead.
The speed of improvement has been impressive. I'm getting enthused about this stuff more and more. :)
Plus, who doesn't enjoy making random goofy stuff in Dall-E while waiting for some progressbar to advance? That alone is worth the time investment for me.
[+] [-] nicklecompte|2 years ago|reply
I was testing ChatGPT-3.5 with F# in 2023 and saw some really strange errors. Turns out it was shamelessly copying from GitHub repos that had vaguely related code to what I was asking - this was easy to discover because there's not much F# out there. In fact the relative sparsity of F# is precisely why GPT-3.5 had to plagiarize! It did not take long to find a prompt that spat out ~300 lines verbatim from my own F# numerics library. (I believe this problem is even worse for C numeric programmers, whose code and expertise is much more valuable than anything in .NET.) OpenAI's products are simply unethical, and I am tired of this motivated reasoning which pretends automated plagiarism is a-okay as long as you personally find it convenient.
But even outside of plagiarism I am really nervous about the future of software development with LLMs. So often I see people throwing around stats like "we saw a 10% increase in productivity" without even mentioning code quality. There are some early indications that productivity gains in LLM code assistance are paid for by more bugs and security holes - nothing that seems catastrophic, but hardly worth dismissing entirely. What is frustrating is that this was easily predictable, yet GitHub/OpenAI rushed to market with a code generation product whose reliability (and legality) remains completely unresolved.
The ultimate issue is not about AI or programming so much as software-as-industrial-product. You can quickly estimate increases in productivity over the course of a sprint or two: it's easy to count features cleared and LoC written. But if there are dumb GPT "brain fart" errors in that boilerplate and the boilerplate isn't adequately reviewed by humans, then you might not have particularly good visibility of the consequences until a few months pass and there seem to be more 5-10% bug reports than usual. Again, I don't think the use of Copilot is actually a terrible security disaster. But it's clearly a risk. It's a risk that needs to be addressed BEFORE the tool becomes a de facto standard.
I certainly get that there's a lot of truly tedious boilerplate in most enterprise codebases - even so I suspect a lot of that is better done with a fairly simple deterministic script versus Copilot. In fact my third biggest irritation with this stuff is that deterministic code generation tools have gotten really good at producing verifiably correct code, even if the interface doesn't involve literally talking to a computer.
[+] [-] timrobinson333|2 years ago|reply
I find I spend most of my time thinking about the problem domain and how to model it in logic, and very little time just banging out boilerplate code. When I want to do the kind of task a lot of people will ask gpt for, I find it's often built into the language or available as an existing library - with experience you realise that the problem you're trying to solve is an instance of a general problem that has already been solved.
[+] [-] tdudhhu|2 years ago|reply
At the core AI/ML is giving you answers that have a high probability of being good answers. But in the end this probability is based on avarages. And the moment you are coding stuff that is not avarage AI does not work anymore because it can not reason about the question and 'answer'.
You can also see this in AI generated images. They look great but the avarage component makes them all look the same and a kind of blurry.
For me the biggest danger of AI is that people put too much trust in it.
It can be a great tool, but you should not trust it to be the truth.