top | item 32036224

Using GPT-3 to explain how code works

385 points| simonw | 3 years ago |simonwillison.net | reply

147 comments

order
[+] goodside|3 years ago|reply
For the past few weeks, I've been collecting and refining prompting tricks for getting GPT-3 to emit and understand various types of non-prose output. Many of these are discussed in the Twitter thread the OP links to, but I'd like to highlight a few of my favorites here too.

- GPT-3 can produce simple SVG diagrams (https://twitter.com/goodside/status/1545883023719239689) from an English description, or incrementally modify a given SVG diagram (https://twitter.com/goodside/status/1545888869207400454).

- GPT-3 can produce diagrams using Graphviz dot notation (https://twitter.com/goodside/status/1545109709035900928), including diagrams that incorporate real-world context (https://twitter.com/goodside/status/1545112039147847682) not given in the prompt.

- It can emit and understand HTML and Markdown, including table markup — e.g., "Produce an HTML table giving the members of the Beatles and their instruments"

- It can accept/emit multiple named files, with arbitrary content, simply by embedding those files as values in a larger JSON/YAML document. Here, (https://twitter.com/goodside/status/1545848754883919872) I demonstrate completing the file tree of a trivial Python package given a README.md.

- It can understand the raw JSON file contents of .ipynb files, successfully following directions given to it in markdown cells, emitting multiple code cells, etc.

While all this is fun, as many people have pointed out the model has a bad habit of "hallucinating" — producing highly specific, wrong output. Avoiding this (https://twitter.com/goodside/status/1545793388871651330) is more of an art than a science. Be highly skeptical of any output you get from GPT-3, and search for verification methods where you can.

[+] 6keZbCECT2uB|3 years ago|reply
This is very cool.

Two uses come to mind: 1. Alternative variable name suggestions. I am writing code, and I try to figure out what to name a variable that about a data frame which is my performance data which takes the maximum of random samples and batch size leaving data size and thread count, and all I can come up with is unbatched_speed or something. I can think of a lot of scenarios where a context sensitive thesaurus / autocomplete would help. If it's obvious to an AI, maybe it'll be obvious to a junior dev?

2. I expect there are a lot of cases where not technical people could understand code, but get intimidated because it looks like code. An AI to say what the code for the business logic that they care about, but using prose.

[+] pmoriarty|3 years ago|reply
"For the past few weeks, I've been collecting and refining prompting tricks for getting GPT-3 to emit and understand various types of non-prose output"

I'd like to see it get good at continuing poetry.. It's pretty awful at that, from my own experiments.

[+] 323|3 years ago|reply
Has anybody tried code reviews? As in "GPT-3, please do a code review of this code".
[+] akamoonknight|3 years ago|reply
This is a fascinating use case that wouldn't have occurred to me at all, quite cool, even if it "doesn't know anything about anything at all".

It even does pretty good with Verilog, which I assumed it would have issues with, admittedly some of this is just "google-able", but even so. It can even create a reasonable attempt at a testbench (even though it doesn't quite get it right), which is crazy to me.

*Explain this code:

module my_module(a,c,y); input a,c; output y; assign y= c ? a : 1'bz; endmodule*

This code is a Verilog module that defines three inputs (a, c, y) and one output (y). The output y is assigned to the value of a if c is true, otherwise it is assigned to 1'bz.

*What is 1'bz ?*

1'bz is a Verilog constant that represents a bit value of 'z', which is used to indicate a high impedance state.

*Why is high impedance used ?*

High impedance is used in digital circuits to allow multiple drivers to share a single bus without contention. When two or more drivers attempt to drive the same bus at the same time, a contention can occur and the bus voltage can become undefined. By using high impedance, drivers can avoid contention and allow the bus to remain stable.

*Give me a testbench for this module*

module testbench; reg a; reg c; wire y;

  my_module dut(a, c, y);
  
  initial begin
    a = 1'b0;
    c = 1'b0;
    #1;
    a = 1'b1;
    c = 1'b1;
    #1;
    a = 1'bz;
    c = 1'b1;
    #1;
    a = 1'bx;
    c = 1'b1;
    #1;
    $stop;
  end
endmodule

*Create the same testbench but add validation of the output.*

module testbench; reg a; reg c; wire y;

  my_module dut(a, c, y);
  
  initial begin
    a = 1'b0;
    c = 1'b0;
    #1;
    a = 1'b1;
    c = 1'b1;
    #1;
    a = 1'bz;
    c = 1'b1;
    #1;
    a = 1'bx;
    c = 1'b1;
    #1;
    $stop;
  end
  
  always @(*) begin
    if (c == 1'b0)
      assert(y == 1'b0);
    else if (c == 1'b1)
      assert(y == a);
    else if (c == 1'bz)
      assert(y == 1'bz);
    else if (c == 1'bx)
      assert(y == 1'bx);
  end 
endmodule
[+] tails4e|3 years ago|reply
Wow. I cannot fathom how it can work this well, though I think the validation is incorrect, the first assert shoukd check for z and not 0. Still, it's scary to be honest.
[+] hackernewds|3 years ago|reply
This is frighteningly impressive
[+] jwilk|3 years ago|reply
> The (?xm) notation is a modifier that allows the regular expression to span multiple lines and to use comments.

This is incorrect.

(?xm) enables the VERBOSE (X) and MULTILINE (M) flags.

From the docstring:

  M  MULTILINE   "^" matches the beginning of lines (after a newline)
                  as well as the string.
                  "$" matches the end of lines (before a newline) as well
                  as the end of the string.

  X  VERBOSE     Ignore whitespace and comments for nicer looking RE's.
[+] bo1024|3 years ago|reply
The scary part about good AI is not knowing how good it is, i.e. not being able to trust it because you don't know when it'll be wrong (nor does the AI itself know).
[+] xdfgh1112|3 years ago|reply
The X flag also allows the written regex to span multiple lines, i.e. it counts newlines as whitespace. Interesting confusion there.
[+] simonw|3 years ago|reply
Hah yeah, good catch - I'd missed that incorrect "allows it to span multiple lines" statement.
[+] bribri|3 years ago|reply
I've been doing a lot of experiments with gpt-3 on twitter.

Works with mermaid js

https://twitter.com/Bsunter/status/1535062559069376512

Great for making study guides

https://twitter.com/Bsunter/status/1535730704138444801

I also wrote a plugin to interact with gpt3 in the logseq text editor and some examples here.

https://github.com/briansunter/logseq-plugin-gpt3-openai

I have a few posts on my blog

https://briansunter.com/blog/gtp3-openai-logseq-notetaking/

https://briansunter.com/graph/#/page/how%20to%20use%20gpt3%2...

[+] _bramses|3 years ago|reply
Self-plug: I built Stenography(https://stenography.dev/) that does this for any codebase using smart parsing. The VSC code extension automatically writes code documentation
[+] goodside|3 years ago|reply
This looks cool, but I’d be more tempted to sign up if I could see examples of the documentation it can produce unassisted from an example codebase. My biggest concern is that the documentation would lack global context in a large file tree, and only be suitable for explaining small, self-contained files. The time investment to see if this is true or not is a bit too high as it is, so anything you can do to showcase the quality up-front might win you more signups.
[+] sureglymop|3 years ago|reply
Man, you're cover of redbone made me want to try it out haha
[+] mjburgess|3 years ago|reply
A NN is just a fancy search engine: the weights are a compression of its training data; it's prediction algorithm is a decompression+search. These "large models" are effective because the weights run to 100s GBs, so compressing TBs of data for searching.

In any case, my question here is: how do you not go mad in this techno-religious environment where people (seemingly desperately and earnestly) equivocate correlations in written text with thinking?

I feel at once overwhelmed, despairing, and a sense of need to point out that it isn't true. It's the most obscene form of wishful thinking and superstition.

HNers seem capable of identifying pseudoscience when its an article in nutrition (namely, eg., that correlations in the effects of human behaviour are not explanations of it)... but loose all sense when "AI" becomes the buzzword.

Correlations in *TEXT* are not models of language, intelligence, thinking. This is pseudoscience. And the most extreme and superstitious form: thinking that correlations are meaningful because we find them meaningful.

A person using language uses it in response to an environment, so "Pass me the salt" is said on the occasion that one needs salt. The "language model", `language = person(...)` includes the environment as input (and many other things).

Correlations in pixel patterns of the night sky are not a model of gravity. Correlations in text are not a model of language. Correlations in the detritus of human behaviour are not models of intelligence.

But to my question: how does one cope in an environment of this kind of wishful thinking?

[+] DoingIsLearning|3 years ago|reply
> In any case, my question here is: how do you not go mad in this techno-religious environment where people (seemingly desperately and earnestly) equivocate correlations in written text with thinking?

We can not unequivocally explain what thinking is (in a human sense), and what makes it vastly different from Cephalopod thinking or a ridiculously large NN 'thinking'.

Until we reach a body of knowledge in Neuroscience that can give such an explanation on is it that makes human thinking somehow special (other than scale) then it is equally religious to assume that human consciousness or human creativity are somehow unique (in nature) and irreproducible (with technology).

[+] olalonde|3 years ago|reply
An unimpressive mechanism can produce impressive results. The results listed in the article are simply jaw dropping to me, even though I understand the relatively simple mechanism that produced it.

How can you not be impressed by something that would have been considered inconceivable just a few years ago?

[+] tigerlily|3 years ago|reply
> HNers seem capable of identifying pseudoscience when its an article in nutrition (namely, eg., that correlations in the effects of human behaviour are not explanations of it)...

For me the real question is how well can GPT-3 identify pseudoscience.

[+] anon2020dot00|3 years ago|reply
We know that correlation is not thinking, but it is good enough for it to help or to be used as a starting point which is why people appreciate it as code documentation in this case or code completion in Github co-pilot.
[+] golergka|3 years ago|reply
> NN is just a fancy search engine

What makes you sure that your thinking process is different?

What's the difference between a model and complicated correlations?

[+] muzani|3 years ago|reply
What's the context of this? The article appears to be research on inputs to get GPT-3 to explain code.
[+] johnthescott|3 years ago|reply
ignore them. artificial intelligence is for those who need it.
[+] pornel|3 years ago|reply
This as an assistant/teacher could be incredibly empowering for people learning to program.

I'm sceptical of AI writing programs for people, but I can easily imagine such AI guiding/unblocking non-programmers enough for them to actually program.

[+] f38zf5vdt|3 years ago|reply
> As with everything GPT-3, this answer sounds very convincing! But is this actually true? I would expect the SQL query optimizer to be smart enough to optimize the LEFT JOIN to the point where it would run efficiently. I think GPT-3’s answer to that question is actually very misleading.

I really enjoy how well it gives wrong answers. The internet being polluted with GPT3-like text models is going to send us back to the time when verifying anything online as truthful is difficult, if it hasn't already.

[+] mwint|3 years ago|reply
Maybe my next move should be to start a consulting shop: “We Fix GPT-3 Code”.
[+] booleandilemma|3 years ago|reply
Could they do this for legal documents too?

I'd hate to see just the programmers put out of work by GPT-3.

[+] lee101|3 years ago|reply
Hey also try out https://text-generator.io

the same querying works but it will save you a load on gpt-3 cost https://text-generator.io/playground?text=SQLite+create+a+pa...

codex is technically free if you have access but soon it wont be so keep text-generator.io in mind there too as it can generate code from descriptions etc too at a wildly more competitive price.

Had to use a bit of the "Create Table" syntax, but it did a create table if not exists which is nice... it missed the content field though

[+] make3|3 years ago|reply
can't find the model size or a description of the dataset on the text-generator page
[+] bilsbie|3 years ago|reply
I feel like we’re not appreciating how amazing this is.
[+] NO-AI|3 years ago|reply
Many people here on Hacker News are realizing that a significant part of their hard-earned skills are now available to anyone for a few cents of compute time.

Once these systems scale up to attention over long pieces of program text (instead of just a few thousand tokens) most of software development will be near minimum wage, almost skilled labor.

You didn't mind when taxi drivers became obsolete. "I'm not a taxi driver, so I don't care!" We'll see how you feel when your economic value approaches zero.

[+] nprateem|3 years ago|reply
It may not be there today, but I think it's obvious that AI is going to basically make all creatives obsolete. At some point I expect it'll be able to:

* Create movies, music and books on-demand

* Design interiors, buildings, electric circuits, web sites, build pipelines, software architectures, etc.

Software devs may be one of the last industries to be obsoleted, but it'll happen. About the only thing left will be the executive function of decision making, but even that's likely to devolve into:

"AI, where's a gap in the market?"

"Jonny, there are gaps in blah, blah, blah. Products that do x, y, z are in demand"

"OK AI, design and print me a prototype"

"Printing in progress..."

[+] Yhippa|3 years ago|reply
I remember people online saying that automation and AI will free up our time to be creators instead of consumers. Looks like that's in jeopardy now. I guess back to consuming content for me.

I feel like this could be one of the biggest technological changes I'll see in my lifetime but there isn't a lot of hype around it like say VR/AR and Web3. If this does end up exponentially improving, life is going to feel like magic.

[+] layer8|3 years ago|reply
Tangential question: Is GPT-3 (and similar tools) being kept up to date with new information? Or would it basically have to be retrained from scratch to properly incorporate new data?
[+] simonw|3 years ago|reply
There's a mechanism you can use to add your own training data on top of GPT-3. This tutorial talks about using that to generate song lyrics (about half way through): https://wingedsheep.com/songwriting-with-gpt-3/

I don't know how often OpenAI build new versions of the core GPT-3 language model.

[+] loufe|3 years ago|reply
This is fantastic, I've been enjoying myself thoroughly with Dall-E 2 recently and similar to this love trying to imagine how the training method / training data must have led to the output. It's truly amazing times we live in, but it will take a lot more work than people 70 years ago thought to get to a level of polish.
[+] senthilnayagam|3 years ago|reply
those who don't have GPT3 access and use GitHub Copilot, if you install copilot labs beta in VS code, you can get code explanation inside the editor itself, also bonus it offers language translation to other programming languages
[+] SLWW|3 years ago|reply
I asked GPT-3 about several things ranging from creating for me a simple JS canvas (and embed it into it's own self-contained HTML doc) with a red bouncing ball. It interpreted that instruction to have the ball go up and down, starting at a diagonal and going from left to right, bouncing off all sides of the canvas.

I then decided to ask it about Creator controversies which it gave surprisingly accurate and knowledgeable answers that pretty much mirror my own thoughts about a complex controversy that spanned over two years.

It's kind of crazy, a little worrying with just how much the AI could gather from a few nasty reddit comment threads.

[+] anewpersonality|3 years ago|reply
How the flying f*ck did they collect all the data for GPT-3?
[+] jka|3 years ago|reply
Slightly-curious, slightly-unnerved, slightly-paranoid question here:

Is there any chance that this code snippet was sampled from repositories that contained associated commentary/discussion, guiding GPT-3 to produce similar explanations?

Or is this genuinely an explanation that it can produce without context local to the code in question?

And what number of people are able to determine the answer to my first question?

[+] hansword|3 years ago|reply
>Is there any chance

There is a chance.

To get a better answer, you have to specify what exactly you mean by 'sampled' and by 'without context local to the code in question'.