top | item 23250379

Demo of an OpenAI language model applied to code generation [video]

283 points| cjlovett | 5 years ago |twitter.com | reply

152 comments

order
[+] neil_s|5 years ago|reply
I had trouble accessing the relevant video snippet even after going through the conference registration, so here's a summary.

You can view the demo at https://twitter.com/i/broadcasts/1OyKAYWPRrWKb starting around 29:00.

It's Sam Altman demoing a massive Open AI model that was trained on GitHub OSS repos using a Microsoft supercomputer. It's not Intellicode, but the host says that they're working on compressing the models to a size that could be feasible in Intellicode. The code model uses English-language comments, or simply function signatures, to generate entire functions. Pretty cool.

[+] YeGoblynQueenne|5 years ago|reply
So that's basically program synthesis from natural language (ish) specifications (i.e. the comments).

I can see this being a useful tool [1]. However, I don't expect any ability for innovation. At best this is like having an exceptionally smart autocomplete function that can look up code snippets on SO for you (provided those code snippets are no longer than one line).

That's not to say that it can't write new code, that nobody has quite written before in the same way. But in order for a tool like this to be useful it must stick as close as possible to what is expected- or it will slow development down rather than helping it. Which means it can only do what has already been done before.

For instance- don't expect this to come up with a new sorting algorithm, out of the blue, or to be able to write good code to solve a certain problem when the majority of code solving that problem on github happens to be pretty bad.

In other words: everyone can relax. This will not take your job. Or mine.

____________

[1] I apologise to the people who know me and who will now be falling off their chairs. OK down there?

[+] gwern|5 years ago|reply
I think you are underselling the potential of a model which deeply understand programming. Imagine combining such a model with something like AutoML-Zero: https://arxiv.org/abs/2003.03384 It may not be 'creative', but used as tab-completion, it's not being rewarded or incentivized or used in any way which would expose its abilities towards creating a new sort algorithm.
[+] DJHenk|5 years ago|reply
> In other words: everyone can relax. This will not take your job. Or mine

Of course not. This technology converts writing code into bug hunting in pre-written code. Finding bugs in code that you did not write is way harder than writing the code yourself.

So if anything, this makes programming harder, not easier, and we will need more programmers, not less.

[+] westurner|5 years ago|reply
> At best this is like having an exceptionally smart autocomplete function that can look up code snippets on SO for you (provided those code snippets are no longer than one line).

Yeah, all it could do for you is autocomplete around what it thinks the specification might be at that point in time.

> But what if Andy gets another dinosaur, a mean one? -- Toy Story (1995)

[+] joshuak|5 years ago|reply
I agree completely with your expectation of the abilities of such a system.

However, I think very little programming labor is employed in the construction of new algorithms or even most business logic, even a casual stroll through github reveals a staggering amount of reimplementation.

I think the promise here is the ability to code in a more conceptual way with less fiddling with the finicky details.

[+] gradys|5 years ago|reply
I'd put it differently. This is going to take your job, just like an assembly programmer from the 70s might consider Python to have basically taken their job. In software, the job is constantly eating itself and transforming.

It's part of the job to continually incorporate new capabilities and lever yourself up.

[+] BaronSamedi|5 years ago|reply
I agree. While this is well done, it seems to be copying human programming techniques rather than allowing the AI to create code that it thinks is optimal. I think there is the potential to evolve efficient and secure code that is free from the constraints we impose on it due to the way our minds work. Such code may not be intelligible to us but could very well be much better than what we could write.
[+] random32840|5 years ago|reply
An AI like this can hold a hell of a lot more information in its head at one point than a human. Each decision it makes is based on way more context, it can manipulate the problem using much more information, much faster. The problem is that it can't think in abstractions.

If AI gets to the point where it has a reasonable understanding of the shape of the data & the basic spatial manipulations being applied (not far off IMO), I'd expect it to be waaaaaay better at discovering certain types of new algorithms than humans. It can handle thinking about algorithms that have millions of independently moving parts in a way a human can't.

Humans have the edge deriving algorithms that require a sequence of high-level steps on an abstraction. "Do this, then we get a thing, then we do some stuff to the thing, stretch it, squash it, massage it." AI sucks at that, it doesn't think in the same kind of flexible abstractions.

But imagine if you build an understanding of how the code will be compiled & how that will interact with the cache into the AI. That's very difficult for humans because you can't think about all those mechanics at once, we have to focus on one at a time. An AI that really gets it? I could see it writing a better sorting algorithm for a specific, complex datatype than a human could, or at the very least having the competetive edge because it can do it basically instantly.

[+] izabera|5 years ago|reply
How often does the average programmer come up with a new sorting algorithm?
[+] pharke|5 years ago|reply
Yeah I'm thinking it would be more useful to have a really well indexed library of functions accessible by search.
[+] gameswithgo|5 years ago|reply
alphago and alphastar were certainly creative. this project in its current state may not have that capacity but it also may not be a huge leap to get there.
[+] tanilama|5 years ago|reply
I mean it is cool.

But there is the thing, the natural description of a function is not always this unambiguous.

When you are telling a function to 'compute XYZ', what you are actually doing is 'check whether X.a exists, if so execute branch 1), else branch 2)'.

If the logic gets really complicated, then describing it accurately in human language isn't necessarily faster than doing it in code directly. Otherwise, we don't need invent programming languages like at all, we can just write compilers to interpret and execute human languages.

And I am interested, as whether the model itself is conditioned on the type constraint of class. It is neat that they pick Python in this case. But if it is Java or other static typed language, would this system condition its generation not only the natural text, but also the resulted type system? My bet, per my understanding of the language modeling approach they use is, they are not doing this, due to very high complexity and cost of the training, and domain adaptation.

Overall, this again is an interesting demo. But I think for code generation based on human language to be useful, we are really in a scenario, that you need to go 99% accurate for it to be remotely practical.

[+] nerdponx|5 years ago|reply
This might be more useful for a task like "read files off a list, and download them in parallel, with no more than 20 concurrent downloads." That particular task might be a one-liner in some programming languages, but there are a lot of programs like that which need significant bookkeeping and/or boilerplate even though their plain-language description of intended behavior is not complicated.

Or implementing a sophisticated protocol that has a formal specification. If you can express the correct behavior in some kind of pithy pseudocode, a tool like this could "compile" that to code in various programming languages. Like a super-powered version of SWIG.

[+] MiroF|5 years ago|reply
I agree that code generation of complex functions is hard.

But I think the example given of unit testing - ie. natural language description of specific behavior of function -> code is extremely useful.

[+] IdiocyInAction|5 years ago|reply
How does this do compared to other models? Is this a totally cutting edge result? On the surface, it seems quite impressive, but sans an environment to try it out with, I cannot be entirely sure. Still, this does make me question whether I chose a safe career, haha.

The thing is, I'd really need to see a live demo to see how good this is. Making mistakes is actually kind of a big issue; as most people know, debugging code is harder than writing it. And a lot of the language models which can write impressive-seeming text also generate masses of garbage. There's no way to know whether this was cherrypicked or not.

The mere fact that it can extract meaning from text like this is already really impressive though.

[+] bglazer|5 years ago|reply
I've read a fair number of papers on neural program synthesis lately. To me, these seemed to be obviously cherry picked examples, so you can't really evaluate the whole system based on them.

However, this is fairly impressive for a couple reasons. First, the system constructs programs from natural language descriptions, rather than examples of input-output pairs or a formal specification, which are the most common settings for program synthesis. Second, they're generating full blown python, not a smaller, domain specific language.

Finally, and this is pretty mind-blowing, is the seamless, idiomatic use of loops, branches, and function calls. I haven't seen previous program synthesis tools able to generate such complex code. They're typically limited to simple linear programs with less than about 100 lines. Complex control flow and function calls are still beyond their reach for the most part.

I'm not an active researcher in neural program synthesis, so my statements may not reflect the current state of the art.

I honestly thought that the most promising route forward for program synthesis would be a model that incorporated knowledge of the syntax and semantics of code. Most likely, a model that manipulated, or at least had some view of, the program's AST. This seems to be just throwing a giant Transformer model at github.

Fine tuning a vanilla language model on a giant corpus of code feels like a dead end for the field, long-term. It seems obvious to me that humans are doing something more than just statistical pattern recognition and generation when we write and reason about code.

Then again, it's hard to argue with results. I'm sure lots of pre-neural network voice recognition researchers were in love with the elegance of their hidden markov models.

Edit: Also, everyone should go try the FlashFill feature in Microsoft excel. As far as I know, it's the only example of program synthesis shipped in a consumer facing production system, and it works shockingly well.

[+] bo1024|5 years ago|reply
Ha. You hit the nail on the head. There is no rigorous way to measure AI-generated anything. (to my knowledge) So every demo is "ooh look at this" and actual performance is not scientifically evaluated, because we don't know how. This includes images, text, etc.
[+] parksy|5 years ago|reply
I have thought about this before but I can see that logical errors are introduced which must be manually tested and reviewed anyway, so what if a more reliable approach could be achieved by training these data sets on test cases alongside passing code?

This way developers just write unit tests or functional tests, and the AI generates code and retrains itself until the code passes for all tests. This could happen silently in the background as the developer defines the tests.

A number of natural language test frameworks exist, Behat for example lets you define tests such as:

Feature: Multiple site support

  Background:
    Given a global administrator named "Greg"
    And a blog named "Greg's anti-tax rants"
    And a customer named "Wilson"
    And a blog named "Expensive Therapy" owned by "Wilson"

  Scenario: Wilson posts to his own blog
    Given I am logged in as Wilson
    When I try to post to "Expensive Therapy"
    Then I should see "Your article was published."

  Scenario: Greg posts to a client's blog
    Given I am logged in as Greg
    When I try to post to "Expensive Therapy"
    Then I should see "Your article was published."
It could still fit the dream of describing to a computer what kind of program you want and having it figure out the plumbing.

Anyway interesting work. Very interesting. I remember a few colleagues laughed at me no more than 5 years ago when I suggested that AI would eventually write code. And here it is, in an early version, flawed surely but only set to improve.

Edit to add: This subject while insanely interesting to me is well out of my wheelhouse. I'm guessing there's possibly semantic structure to the above that the type of model being used in the demo can't deal with? Like this one use-case has to co-exist in an entire ecosystem of dependencies and related entities... Could the model cope with that or is it just calculating the likelihood of the next character like other models I've seen, but with insane accuracy when it comes to code?

[+] BaronSamedi|5 years ago|reply
Instead of Test Driven Development, Test Only Development? I like that idea. This reminds me of an article I read a while ago on co-evolutionary training in genetic programming: one algorithm evolving to do something, with another evolving to break it.
[+] sailingparrot|5 years ago|reply
I'am a bit confused, is this built by OpenAI or Microsoft? Microsoft released the paper IntelliCode Compose: Code Generation Using Transformer [1] 4 days ago and there is no attribution to anyone from OpenAI in it.

Are those two entirely separate and yet exactly similar initiatives?

[1]: https://arxiv.org/abs/2005.08025v1

[+] p1esk|5 years ago|reply
IntelliCode Compose is built around a multi-layer generative pretrained transformer model for code (GPT-C), which is avariant of the GPT-2

GPT-2 is built by OpenAI

[+] grensley|5 years ago|reply
Wow, this has the ability to be a total gamechanger. You have to be really observant about the bugs though, I would have totally missed the one with the price discount without executing it.
[+] netsec_burn|5 years ago|reply
By lowering the barrier of entry of programming further, I wonder if we'll see more bugs (like the price discount) as a result of this?
[+] swalsh|5 years ago|reply
These are just baby steps, but holy shit is that impressive. It kind of feels like working with offshore devs, but it's in real time.
[+] nnq|5 years ago|reply
...that's mildly insulting
[+] 29athrowaway|5 years ago|reply
I've worked with developers from all around the globe. While it's true that some cannot even write fizzbuzz, some others can be extremely brilliant individuals with an excellent work ethic.
[+] corbins|5 years ago|reply
[+] yread|5 years ago|reply
Pretty amazing. Starts at around 28:00
[+] gradys|5 years ago|reply
I worked on project very much like this last summer, a transformer language model applied to code completion.

You'd be surprised how easy it is to get a model that performs as well as what you see in the video. And it's even easier now that people have built great libraries for fine-tuning generative language models.

I encourage you to try it yourself! There are many interesting extensions for people to explore:

- Use bi-directional context (vanilla GPT-2 only sees backward context)

- Integrate with semantic analysis tools.

- Experiment with different context representations. You condition the model on an arbitrary sequence of N tokens. It's not necessarily the case that you should spend that whole budget on the N tokens that came immediately before. What about including the imports at the top of the file? What about the docstrings for functions that were just used? What about the filepath of the current file?

Don't look at something like this as though watching your job be automated away. Look at it as a tool that you can master and use to move up the stack.

[+] arielroth|5 years ago|reply
Did you explore all of these things? What were your results?
[+] mring33621|5 years ago|reply
Amazing!

So the developer's role will shift to:

1) writing good enough descriptions of the code to be generated by the AI model

2) fixing any little issues in the generated code

[+] cjlovett|5 years ago|reply
Yeah, I guess developers will have to all become data scientists to help train these A.I's to write better code :-) Perhaps there will be a new business model around selling your higher quality code to help train the A.I to be better and better... so we need to label code "good" and "crap" so the A.I. can avoid learning from crappy code :-)
[+] swiley|5 years ago|reply
It’s not that we don’t have enough code it’s that the code doesn’t do what we want.

To get this you can just grab any random 18 year old who knows js and have them hack something out. No one hires 18 year old js hackers though and there’s a reason for that.

[+] detay|5 years ago|reply
that would be supervised ai training, until 3) programmers become obsolete.
[+] disambiguation|5 years ago|reply
3) writing the in-house or market competitive service alternative ;)
[+] simonhughes22|5 years ago|reply
This is really cool. However, I doubt it can write more than very simple functions. That may be enough to be useful however. It would be nice if they created a demo page where we could try this out. This use case is a little different than the auto-complete one.
[+] jfoster|5 years ago|reply
I wonder if this could be trained on just bug fix commits from GitHub in order to produce a model that could suggest bug fixes for an existing code base.
[+] symplee|5 years ago|reply
Can this freaky A.I. also generate the corresponding unit tests?

Or, for TDD, generate the unit tests first based on the function name and description. Then, if the dev updates any of those tests, or adds more tests, use that information in auto generating the appropriate code.

[+] simonhughes22|5 years ago|reply
Towards the end of that section, he mentions they have also used it to generate unit tests. I doubt it's doing full TDD, but it seems they are part of the way there.
[+] cjlovett|5 years ago|reply
Great question, I actually think writing test code is harder than writing the product code.
[+] pseudosudoer|5 years ago|reply
At the end of the chat with OpenAI they mention that their model can be used for generating unit tests as well.
[+] Jach|5 years ago|reply
I don't see it replacing (or even much augmenting) professional programming any time soon... My predicted use case for this is mostly with non-programmers. They'll be instructed to write in English what they want to be done, and behind the scenes this will attempt to generate code, execute it, and give the results. A fun demo would be writing "Download the recipe on this webpage (paste link) and order the ingredients from Safeway". If it could generate its own billing and shipping storage to remember indefinitely after getting it from the user, then generate the relevant web scraping / web driving or API code for various websites, that'd be pretty sweet.
[+] cjlovett|5 years ago|reply
Hey, now we have a reason to write proper unambiguous code comments :-)
[+] rpiguy|5 years ago|reply
Donald Knuth would be proud! (it appears proper commenting is very important to the AI's ability to generate code)
[+] chrisco255|5 years ago|reply
Is this a demo of their AI 'autocomplete' tech that they've built into Visual Studio and VS Code?