top | item 40030786

Trying to Understand Copilot's Type Spaghetti

77 points| mooreds | 1 year ago |rtpg.co | reply

83 comments

order
[+] mumblemumble|1 year ago|reply
So, that code gets me thinking about premature optimization. In the "verschlimmbessern born of optimizing the wrong thing because you didn't use measurement to guide your efforts" sense.

The rule of thumb I hear is that, in a mature product, reading and maintaining code takes about 10 times as much effort as writing it in the first place. I've never tried to measure this myself, but it doesn't seem to be wildly off from what I see at work, so let's go with it. Let's also assume, for the sake of argument, that these AI tools double the productivity of people writing new code. Or, equivalently, they halve the time it takes to write it. (This is a lot more than what I see people typically claim, but I'm trying to steelman and also it makes the math easier.)

Anyway, this would imply that, speaking purely in terms of raw productivity (not, say, security or correctness) the AI coding assistant is a net win when the code that's written using it is less than 5% more difficult to maintain.

And I'm inclined to think that, if Copilot & friends are enabling more people to write more code that looks like what we see in this article (also not wildly off from what I see at work), then it's hard to see how they could possibly be making codebases where they are used less than 5% more expensive to maintain.

[+] david-gpu|1 year ago|reply
Can you elaborate on this part? I am trying to understand.

> in a mature product, reading and maintaining code takes about 10 times as much effort as writing it in the first place

Wouldn't that imply that rewriting a mature codebase from scratch would take ten times less effort than reading and maintaining it?

I only saw two instances where (different) management approved a complete rewrite of a working product and in both cases it turned out to be a disaster, because it is easy to severely underestimate how much effort it will take to match the features, quality and performance of the old codebase. I suspect it is almost universally better to refactor the old codebase incrementally.

Based on that, I take that you mean something else and I didn't get your point.

[+] blueappconfig|1 year ago|reply
from the original tweet linked in the post "ceiling is being raised. cursor's copilot helped us write "superhuman code" for a critical feature. We can read this code, but VERY few engineers out there could write it from scratch."

I don't really agree that code is superhuman if VERY few is able to understand it haha..! Code should complex but easy to follow to make it brilliant in my opinion

[+] Octoth0rpe|1 year ago|reply
I think Kernighan said something along the lines of "Because debugging code is twice as hard as writing it, only write code half as smart as you are or you'll never be able to fix it later". AI-assisted code generators seems to make this problem much worse as I can now write code 2x, or 3x as smart as I am. What hope will there ever be in debugging this?

A more optimistic take is that maybe such tools will let us write competent code in languages we do NOT specialize in, and in the future either a more competent version of ourselves or some actual expert can fix it if it breaks? That doesn't sound a whole lot better :/

[+] jameshart|1 year ago|reply
That’s not the claim. It’s well commented and formatted so actually quite readable. The claim is that very few could write it.

Though I would say that ‘very few’ is a larger group than they think - there are plenty of people doing metatype programming in TS; I’ve dabbled enough that given the problem I could probably tackle it and I know I learned from seeing others do it (because I am far from a typescript professional). So it’s not ‘superhuman’ if many of the humans who have found themselves wanting to work with the typescript type derivation model could have written it.

These capabilities - type ternaries and inferred type parameters - were put into TypeScript with a view that humans would use them.

The danger here is kidding yourself that this sort of code is beyond human.

[+] zarzavat|1 year ago|reply
Even an average programmer can understand it if they bothered to read the TypeScript documentation. “Superhuman” in this case means “has read the manual”.
[+] nertirs|1 year ago|reply
Seems like the tweet is another AI hype pr piece. Since Devin was making outlandish statements, CoPilot can't fall far behind.
[+] janikvonrotz|1 year ago|reply
I agree! This is not superhuman code, this is machine code.
[+] shiandow|1 year ago|reply
Technically they're saying they can read it but wouldn't be able to come up with it on their own.

Which is impressive, generally reading code is considered harder than reading it so in that sense it is inhuman.

[+] websitescenes|1 year ago|reply
ChatGPT gives me garbage code unless I ask it politely not to. No joke. Usually the first attempt is pure garbage and I have to call it out as such and then it’s like, “you’re right! Here’s the updated code”. No idea why it can basically never get it right the first time. I also find that it can be quite redundant and offer two distinct solutions morphed into one mutant answer which will turn the undiscerning 1x developer into a -10x developer. But hey, it still saves me time. Sometimes..
[+] ijustlovemath|1 year ago|reply
Are you using 4? I've had great luck with 4-Turbo and Opus
[+] bryanrasmussen|1 year ago|reply
maybe ChatGPT crawls stackoverflow, stackoverflow #1 answer old or just garbage, second answer better but not the upvoted,

ChatGPT gives you #1 basically (with some small tweaks but still it is garbage)

you tell it that's garbage

it sees in comments "why is this upvoted answer, answer two is clearly better"

it returns answer two with a few tweaks.

[+] swiftlyTyped|1 year ago|reply
Hi, CopiotKit CEO here (I wrote the original viral tweet). This article is great! Thanks for posting.

I'd also written an analysis of the code - including announcing a $1000 prize for the best alternative code: https://ai88.substack.com/p/ceiling-has-been-raised-analyzin...

We were going to announce the winner this week but if we get a few more submissions we will definitely consider them.

Just submit a PR to https://github.com/CopilotKit/CopilotKit

[+] tgv|1 year ago|reply
That's free training/fine-tuning material, right?
[+] swiftlyTyped|1 year ago|reply
By the way - we practice what we preach- Copilots raise the bar (or the ceiling...) on human productivity - in every domain. Which is why we're building infrastructure to make building copilots easier...

What ideas do you think are still missing from today's Copilots?

I.e. suppose we were looking back at today's Copilots 5 years from now- besides better models, what else has changed?

[+] codelikeawolf|1 year ago|reply
I've been noticing a steady uptick in increasingly complex types like this making it into libraries/@types packages in the definitely-typed repo, and I'm concerned. There are potentially severe performance implications for being too clever, especially since the TS compiler is written in TS. For example, recursive types can seriously bog down the compiler/checker. It doesn't take long to start hitting diminishing returns, especially in larger codebases. You either get perfect type checking while the TS language server uses 800% of your CPU, or you bite the bullet and supplement the lack of typing with unit tests. I think rewriting TS in a more performant language like Zig or Rust would alleviate this to some extent, but TS will still give you more than enough rope to hang yourself with.
[+] Emanation|1 year ago|reply
Why's these seen as being difficilt to write? It's a giant switch statement that recurses. This is less indicative of AI coming a long way and more of programmers never working on a program that stores types as data, this being the most common and rote pattern that exists.
[+] pquki4|1 year ago|reply
When I read the first sentence

> The other day this snippet of Typescript generated by Github copilot was floating around

I was wondering, did that happen on Twitter/X?

And I was not disappointed. That is the place where these people discuss things and take it for granted. Apparently, if you are not using Twitter, you are not part of the conversation.

[+] keybored|1 year ago|reply
We’re really doing LLM hermeneutics now.
[+] tossandthrow|1 year ago|reply
The future is going to be a horrible place for people taking over legacy code-bases. It was difficult enough with other people spaghetti code, but being able to generate vast amounts of hard to decipher spaghetti code it only going to make this horrible,

Sometimes you digress to simpler types not because you couldn't write more complex types, but because you realise that the next person who needs to make changes to this piece of code is going to have a bad time.

[+] Sebb767|1 year ago|reply
> It was difficult enough with other people spaghetti code, but being able to generate vast amounts of hard to decipher spaghetti code it only going to make this horrible

I've seen some legacy code and the kind of spaghetti humans are able to generate - especially with multiple layers of "I'll just throw in something to make feature X work/fix that bug" - is pretty bad already. I honestly doubt code generation will make things much worse.

[+] gregmac|1 year ago|reply
The first thing I see when I look at this is "Where are the unit tests?"

This is somewhere in the realm of "clever" or "efficient" code: it looks like it could be written in an easier-to-grok way as dozens of lines of if statements, but I assume there is a reason it wasn't (that is better than "just because"). AI-generated or not, someone or team is responsible for making sure the code/app/service is working for customers, and they have to be able to fix bugs, maintain and modify this.

Does it work today? As a reviewer (or coming across this while fixing a bug), I only know by either trying it, or spending quite a long time to understand and analyze it in my head. Unit tests are the easiest way of "trying it", with all the edge cases and then some.

What if I'm faced with modifying this in 6 or 12 months? Even if I wrote it, chances are my internal mental model is gone. I'd like some reasonable assurance I am not breaking anything.

Also, I'd like to not be the only one responsible for this forever, so I want to let other people modify it. Unit tests are the guard rails that let me say "Go ahead and do whatever you'd like, but don't break the tests".

[+] rtpg|1 year ago|reply
You can scaffold up "unit tests", but honestly for type-level stuff you are working in a different space entirely. Your types are correctness proofs, so your underlying code is either typed correctly or not. There's not really a middle ground that unit tests catch.

Having said that, typescript's soundness issues make it easy to drive a truck through a certain kind of issue, but generally speaking if your type-level programming is no good you're not really going to be able to run unit tests, let alone validate them. Your code just won't go anywhere.

[+] hesviiggvv|1 year ago|reply
Type level unit tests are indeed super helpful, and in my experience they are easier to write than “real” unit tests, because mocking is trivial.
[+] tylersmith|1 year ago|reply
These tools can also generate tests, the OP just doesn't discuss it.
[+] janikvonrotz|1 year ago|reply
So unraveling AI generated code is the new programming?
[+] PaulHoule|1 year ago|reply
If you are the #2 or later programmer on a project, unraveling human generated code is the old programming.
[+] spacecadet|1 year ago|reply
What is typing. Lets say you have no hands, no eyes, and cannot speak or to speed this up- All you can do is communicate with foot taps (in great detail) to a human translator. Would we tell this person they are not programming?