FWIW, when I'm doing a code review, these are the exact kind of comments that I would tell a committer to remove.
That is, it's like it generates these kinds of comments:
// initializes the variable x and sets it to 5
let x = 5;
// adds 2 to the variable x and sets that to a new variable y
let y = x + 2;
That is, IMO the whole purpose of comments should be to tell you things that aren't readily apparent just by looking at the code, e.g. "this looks wonky but we had to do it specifically to work around a bug in library X".
Perhaps could be useful for people learning to program, but otherwise people should learn how to read code as code, not "translate" it to a verbose English sentence in their head.
I don’t think the intent of this tool is to generate comments which you’d then embed into the code it describes. I think it’s meant to explain, in plain language, what the actual behavior is (for whatever confidence level you might assign to “actual” and “is”).
To your point about the utility of code comments describing the behavior this way, I agree it’s probably much more valuable for beginners. In fact when I’ve mentored early programmers, I sometimes ask them to write out essentially prose like this in comments before writing a single line of executable code.
Now, I’m far from a beginner. I’ve been considered a senior engineer long enough that friends discourage me from disclosing the amount of time, for fear of age discrimination. I can absolutely see the potential of this tool as part of my IDE. I’m on vacation now, but when I return to work I plan to take it for a spin as an aid for refactoring areas of code which clearly work as intended (well, for the most part) but the actual behavior and intent is much less clear.
Here’s why I think it’ll be valuable for refactoring: it can help limit the amount of mental context switching necessary to build a mental model of what the code does. I often find myself trying to produce prose much like this for my own reference, but I end up losing context as fast as I acquire it as I follow references into their respective rabbit holes. Having the tool do that for me can help me stay in a single area of focus. It could also be a useful reference for adding and improving type definitions, maybe even regression tests.
The best part is that it doesn’t, from what I’ve seen, do anything besides populate ephemeral annotations. It doesn’t try to write code or automate anything other than producing a narrative. Like at least one other commenter, I’m skeptical about the reliability of that. But unlike that commenter, I’m willing to take the risk… probably because I’ve learned to be skeptical of my own reliability performing the same task. If my instinct is right that I can use this tool the way I hope, I’ll still scrutinize it for accuracy. But that’s potentially much better than having only one imperfect, meat-based computer doing the work.
Another way to look at it is that it doesnt (afaik) have externsl context, so there are severe limits on what it could possibly infer from some code. Like you say, if something unusual is being done because of a library, or business rules or something, "AI" cannot take this into account. There may be some cases where something non-obvious can be distilled out of the code, though I agree that mostly the stuff you can infer without context is mostly self evident from the code anyway
Edit: just thinking, the canonical example would be something like the fast inverse square root from quake. Is it going to summarize or is it going to tell you
i = 0x5f3759df - (i >> 1);
// Shift i right by one and subtract it from 0x5f...
(It would be cool if it does work here, even if it does, when someone makes up a new thing like this, it couldn't possibly comprehend why it's being done)
I'm always amazed by what AI can do but never been amazed by the actual output(amazed within the context that this is computer generated). Not just Copilot but any tool that has too much of intelligence of itself(Dall-e, Midjourneys etc) feels like this because it reminds me of a person with great talent for compositing stuff but doesn't know what they are doing.
AI generated papers that got published in prestigious journals situation all over again and again. At glance looks amazing but the machine definitely doesn't have any kind of intelligence and the output is actually worthless.
This AI stuff work really well when they do something very specific that is quickly inspectable by human, for example generating interpolated frames in videos or extending a pattern or detecting anomalies kind of stuff.
The moment it strays away from human control it fails amazingly well.
You are grossly over-simplifying what this is doing. Nowhere does it say anything as simple as setting x to y. In nearly each case it takes the context of the variables into account and states the meaning of the function calls, not the meaning of the syntax.
The tweet and video don’t seem to imply this _should_ be a comment.
I have been “learning to program” for 20+ years and would absolutely find this useful as a quick way to get basic information about a chunk of code I’m unfamiliar with.
Not that learning to read code isn’t important, just not always necessarily worth the time (:
Agree. I find, what you usually want to understand is not the "what" or "how" but the "why", and that is quite a bit harder to automate than translating syntax into natural language statements.
The only situation where i see an AI generating this type of comments to be useful would be deciphering an obfuscated C challenge program. Or maybe Perl.
Yep. Nothing particularly mind-blowing about this. It's just a word-by-word translation of code into English. Heck, you don't even need GPT-3 to do this, except for some variety and grammatical correctness.
I just don't trust it, I've worked with GPT-3 before and it sure does a real good job of sounding convincing, but if you don't understand the code there's no way to know if what it's saying is accurate, or whether it's just regurgitating random nonsense that sounds plausible.
It knows how to create sentences that sound like something a human would write, and it's even good at understanding context. But that's it, it has no actual intelligence, it doesn't actually understand the code, and most importantly, it's not able to say "Sorry chief, I don't actually know what this doing, look it up yourself."
The underhanded C contest is a great practical demonstration even "biological intelligences" have a difficult time reading and summarizing code. I wouldn't trust this thing further than I do comments, but I could see it being equally as useful.
There's an example in this example. The line it translates as "we're getting the raw request body" doesn't work on multipart/form-data.
I can easily imagine the reason you're looking at an unfamiliar function in an unfamiliar language (hence needing such a line-by-line translation) is that there's some sort of bug and that edge case is exactly why. The tool would mislead you into thinking it's one of the other lines, because of how simple its translation is.
This is a prime example of the moving goalpost of what intelligence "actually" is - in previous eras, we would undoubtedly consider understanding context, putting together syntactically correct sentences and extracting the essence from texts as "intelligent"
How come general developer audiences aren't more acquainted with GPT-3 (and Codex in particular) capabilities? People in the twitter thread all seem completely mind blown over an app that basically just passes your code to an existing API and prints the result.
I don't want to sound negative of course, and I expect many of these apps coming up, until Codex stops being free (if they put it on the same pricing as text DaVinci model, which Codex is a fine-tuned version of, it will cost a ~cent per query). I'm just wondering how come the information about this type of app reaches most people way before the information about "the existence of Codex" reaches them.
For all the publicity around Codex recently (and especially on HN), it still seems like the general IT audience is completely unaware of the (IMHO) most important thing going on in the field.
And to anyone saying "all these examples are cherrypicked, Codex is stupid", I urge you to try Copilot and try to look at its output with the ~2019 perspective. I find it hard to beileve that anything but amazement is a proper reaction. And still, more people are aware of the recent BTC price, than this.
Source: have been playing with Codex API for better part of every day for the last few weeks. Built an app that generates SQL for a custom schema, and have been using it in my daily work to boost ma productivity as a data scientis/engineer/analyst a lot.
MS has been trying to get AI into intellisense for years now and I always turn it off.
The lack of control over it just makes it annoying. In many ways it's faster to just type out the algorithm than it is to lay the algorithm out and spend the time trying to understand what's there so I can successfully convert the code to what I need.
Then there's the lack of stability. Yesterday it did something different from what it's doing today, so I can't even use muscle memory to interact with it anymore.
Intellisense has _always_ had that annoyance factor of getting in your way sometimes, forcing you to write code in a certain way to minimize that. All this just makes it more annoying and I don't believe anyone who claims it truly makes them more productive.
I have no use for it, and don't expect ever having a use for it. 95% accurate, 99% accurate, and 99.9% accurate are all aweful in this context.
It's something run repeatedly, so small chances will occur. Amoung it's failure states are being very, very wrong in ways that are hard for a skilled human to detect without more work that writing from scratch.
And no one in the space is discussing ways to eliminate categories of bugs, only ways to reduce the frequency. Most of those solutions have the side effect of making the less frequent bugs harder to detect. On balance, that's worse.
And, less importantly, it's only useful for writing boring code that should probably be generalized to an API. Sure, I write plenty of that, but it's not an exciting area to follow in my spare time.
I'd love to try Codex if I could run it on a local GPU and finetune it for my own code. I'd even push to use it at work. But as we're writing in a niche language and our code is heavily problem domain dependent, I don't feel like making my workflow vulnerable to an external supplier, even aside the IP concerns.
Call me when I can download and finetune the weights, like I can with Stable Diffusion.
The problem is that they don't understand the practical ways it can be used. Even tech savvy people don’t yet get it. Even my CEO, a kind of technical person, did not understand the full potential until I explained some use cases.
In one scenario, I took the slow running long MySQL query and rewrote that with Codex in 2 mins.
But I think people have started to realize the potential now.
Pitch: My app https://Elephas.app brings GPT-3 and Codex to all applications in MacOS. Many business professionals are using it.
The question is - can this actually explain the code which really needs explanation - or can it only explain code that should be easy and straight forward to read anyway?
And does having this reduce the amount of discomfort badly readable code creates, and thus make you less inclined to take care the code is and stays easily readable?
No only I find the tool not useful, as it just state the obvious. My personal opinion is the code should be already very near what the tool gives.
The code should be clear enough for not needed such tool. If you need it, you have a very different problem, my friend.
Yep. Not only it generates useless comments, for me it is actually easy to read the code itself, than the generated comments in this case. I don't know neither language nor framework they use, still it is completely readable.
TLDR but you actually end up reading even more than the original. I could be wrong and this might actually work and condense a big function but if that is true, why showcase such an example.
It’s helpful if you can read English but the code is difficult to understand. Most explanations of code are more verbose than the code they’re explaining because code is usually pretty terse compared to natural language.
You can think of “too long” referring to the time it might take someone to reason out a particularly terse, dense line of code verse the actual length of the code.
I know there's such thing as idiomatic code, but I can't help but think the code in the tweet would be much more readable - and no ai needed - if the variables/methods/args were better named.
I think tool like this can make sense if you cannot read the code.
Probably this still gets confused like humans do when variables and functions are named less clearly or even plainly wrong.
I wonder if reading explanation like this makes you more likely to believe code is correct, even if some details are wrong.
In this signature example, you can read the wrong header, calculate hash the wrong way, compare hashes wrong way, etc. there are some many tiny mistakes.
I don't know that I have much of a need for this and, although I'm hesitant to provide crutches to people especially when they're in the early stages of their learning, this might be helpful for more junior people who are ramping up, especially in a large project. Is there a way to use this or something similar today in PyCharm, etc?
I wrote something similar before, my friend had a nice technique to do code analysis and remove everything but the critical path to the point in code you had your cursor over. Then I fed that code path into gpt3 to generate an explanation of that critical path.
Wound up being useful for explanations of long code paths across file boundaries in large code bases.
I know it's completely missing the point here but: it's a good habit to verify signature using constant time comparison rather than == to avoid timing side channel attack :)
[+] [-] hn_throwaway_99|3 years ago|reply
That is, it's like it generates these kinds of comments:
That is, IMO the whole purpose of comments should be to tell you things that aren't readily apparent just by looking at the code, e.g. "this looks wonky but we had to do it specifically to work around a bug in library X".Perhaps could be useful for people learning to program, but otherwise people should learn how to read code as code, not "translate" it to a verbose English sentence in their head.
[+] [-] eyelidlessness|3 years ago|reply
To your point about the utility of code comments describing the behavior this way, I agree it’s probably much more valuable for beginners. In fact when I’ve mentored early programmers, I sometimes ask them to write out essentially prose like this in comments before writing a single line of executable code.
Now, I’m far from a beginner. I’ve been considered a senior engineer long enough that friends discourage me from disclosing the amount of time, for fear of age discrimination. I can absolutely see the potential of this tool as part of my IDE. I’m on vacation now, but when I return to work I plan to take it for a spin as an aid for refactoring areas of code which clearly work as intended (well, for the most part) but the actual behavior and intent is much less clear.
Here’s why I think it’ll be valuable for refactoring: it can help limit the amount of mental context switching necessary to build a mental model of what the code does. I often find myself trying to produce prose much like this for my own reference, but I end up losing context as fast as I acquire it as I follow references into their respective rabbit holes. Having the tool do that for me can help me stay in a single area of focus. It could also be a useful reference for adding and improving type definitions, maybe even regression tests.
The best part is that it doesn’t, from what I’ve seen, do anything besides populate ephemeral annotations. It doesn’t try to write code or automate anything other than producing a narrative. Like at least one other commenter, I’m skeptical about the reliability of that. But unlike that commenter, I’m willing to take the risk… probably because I’ve learned to be skeptical of my own reliability performing the same task. If my instinct is right that I can use this tool the way I hope, I’ll still scrutinize it for accuracy. But that’s potentially much better than having only one imperfect, meat-based computer doing the work.
[+] [-] noufalibrahim|3 years ago|reply
[+] [-] version_five|3 years ago|reply
Edit: just thinking, the canonical example would be something like the fast inverse square root from quake. Is it going to summarize or is it going to tell you
(It would be cool if it does work here, even if it does, when someone makes up a new thing like this, it couldn't possibly comprehend why it's being done)[+] [-] mrtksn|3 years ago|reply
AI generated papers that got published in prestigious journals situation all over again and again. At glance looks amazing but the machine definitely doesn't have any kind of intelligence and the output is actually worthless.
This AI stuff work really well when they do something very specific that is quickly inspectable by human, for example generating interpolated frames in videos or extending a pattern or detecting anomalies kind of stuff.
The moment it strays away from human control it fails amazingly well.
[+] [-] daveguy|3 years ago|reply
[+] [-] cdubzzz|3 years ago|reply
I have been “learning to program” for 20+ years and would absolutely find this useful as a quick way to get basic information about a chunk of code I’m unfamiliar with.
Not that learning to read code isn’t important, just not always necessarily worth the time (:
[+] [-] rk06|3 years ago|reply
1. Write comment first, code later.
2. Tell the intent of code, not the instructions to achieve it
[+] [-] kleiba|3 years ago|reply
[+] [-] alexvoda|3 years ago|reply
But otherwise, I also want the Why not the What!
[+] [-] aaaaaaaaaaab|3 years ago|reply
[+] [-] unknown|3 years ago|reply
[deleted]
[+] [-] xwowsersx|3 years ago|reply
[+] [-] worble|3 years ago|reply
It knows how to create sentences that sound like something a human would write, and it's even good at understanding context. But that's it, it has no actual intelligence, it doesn't actually understand the code, and most importantly, it's not able to say "Sorry chief, I don't actually know what this doing, look it up yourself."
[+] [-] AlotOfReading|3 years ago|reply
[+] [-] Izkata|3 years ago|reply
I can easily imagine the reason you're looking at an unfamiliar function in an unfamiliar language (hence needing such a line-by-line translation) is that there's some sort of bug and that edge case is exactly why. The tool would mislead you into thinking it's one of the other lines, because of how simple its translation is.
[+] [-] naniwaduni|3 years ago|reply
This is also the story of the past decade of "progress" in machine translation between natural languages.
[+] [-] _jayhack_|3 years ago|reply
This is a prime example of the moving goalpost of what intelligence "actually" is - in previous eras, we would undoubtedly consider understanding context, putting together syntactically correct sentences and extracting the essence from texts as "intelligent"
[+] [-] curl-up|3 years ago|reply
I don't want to sound negative of course, and I expect many of these apps coming up, until Codex stops being free (if they put it on the same pricing as text DaVinci model, which Codex is a fine-tuned version of, it will cost a ~cent per query). I'm just wondering how come the information about this type of app reaches most people way before the information about "the existence of Codex" reaches them.
For all the publicity around Codex recently (and especially on HN), it still seems like the general IT audience is completely unaware of the (IMHO) most important thing going on in the field.
And to anyone saying "all these examples are cherrypicked, Codex is stupid", I urge you to try Copilot and try to look at its output with the ~2019 perspective. I find it hard to beileve that anything but amazement is a proper reaction. And still, more people are aware of the recent BTC price, than this.
Source: have been playing with Codex API for better part of every day for the last few weeks. Built an app that generates SQL for a custom schema, and have been using it in my daily work to boost ma productivity as a data scientis/engineer/analyst a lot.
[+] [-] P5fRxh5kUvp2th|3 years ago|reply
The lack of control over it just makes it annoying. In many ways it's faster to just type out the algorithm than it is to lay the algorithm out and spend the time trying to understand what's there so I can successfully convert the code to what I need.
Then there's the lack of stability. Yesterday it did something different from what it's doing today, so I can't even use muscle memory to interact with it anymore.
Intellisense has _always_ had that annoyance factor of getting in your way sometimes, forcing you to write code in a certain way to minimize that. All this just makes it more annoying and I don't believe anyone who claims it truly makes them more productive.
[+] [-] iudqnolq|3 years ago|reply
It's something run repeatedly, so small chances will occur. Amoung it's failure states are being very, very wrong in ways that are hard for a skilled human to detect without more work that writing from scratch.
And no one in the space is discussing ways to eliminate categories of bugs, only ways to reduce the frequency. Most of those solutions have the side effect of making the less frequent bugs harder to detect. On balance, that's worse.
And, less importantly, it's only useful for writing boring code that should probably be generalized to an API. Sure, I write plenty of that, but it's not an exciting area to follow in my spare time.
[+] [-] vxNsr|3 years ago|reply
I read hn pretty regularly but unless you’re really excited about the AI space a lot of this news washes over you and you mostly ignore it.
[+] [-] FeepingCreature|3 years ago|reply
Call me when I can download and finetune the weights, like I can with Stable Diffusion.
[+] [-] kamban|3 years ago|reply
In one scenario, I took the slow running long MySQL query and rewrote that with Codex in 2 mins.
But I think people have started to realize the potential now.
Pitch: My app https://Elephas.app brings GPT-3 and Codex to all applications in MacOS. Many business professionals are using it.
[+] [-] hooby|3 years ago|reply
And does having this reduce the amount of discomfort badly readable code creates, and thus make you less inclined to take care the code is and stays easily readable?
[+] [-] bloppe|3 years ago|reply
[+] [-] SV_BubbleTime|3 years ago|reply
[+] [-] billsmithaustin|3 years ago|reply
[+] [-] PennRobotics|3 years ago|reply
[+] [-] angst_ridden|3 years ago|reply
/* we don't use the actual price but the discounted price, as per email from Manager Bob on 2022-09-16 */
subtotal += price * customer_discount_factor;
or
/* note there's a 2ms delay while relays settle; this is subtracted from sample time, so timeout is not what you might expect */
select(0,&readfd,NULL,NULL,&timeout);
[+] [-] galangalalgol|3 years ago|reply
[+] [-] f1shy|3 years ago|reply
[+] [-] Zvez|3 years ago|reply
[+] [-] naruhodo|3 years ago|reply
Actual human documentation would read something like:
[+] [-] wudangmonk|3 years ago|reply
[+] [-] vxNsr|3 years ago|reply
You can think of “too long” referring to the time it might take someone to reason out a particularly terse, dense line of code verse the actual length of the code.
[+] [-] Waterluvian|3 years ago|reply
[+] [-] dalmo3|3 years ago|reply
[+] [-] metalrain|3 years ago|reply
Probably this still gets confused like humans do when variables and functions are named less clearly or even plainly wrong.
I wonder if reading explanation like this makes you more likely to believe code is correct, even if some details are wrong.
In this signature example, you can read the wrong header, calculate hash the wrong way, compare hashes wrong way, etc. there are some many tiny mistakes.
[+] [-] iLoveOncall|3 years ago|reply
[+] [-] xwowsersx|3 years ago|reply
[+] [-] ianbutler|3 years ago|reply
Wound up being useful for explanations of long code paths across file boundaries in large code bases.
[+] [-] reidjs|3 years ago|reply
If it could give you some context about the implications I could see it being handy for static analysis one day.
[+] [-] forty|3 years ago|reply