top | item 29920035

Github Copilot Wants to Play Chess Instead of Code

280 points| idonov | 4 years ago |dagshub.com | reply

164 comments

order
[+] cube2222|4 years ago|reply
I've been using copilot to write markdown files for a while already and it's really useful. Like the Gmail/Google docs autocomplete but way better.

It's also nice that it uses the structure of the current file and (I think) context from your codebase, so i.e. if you're writing structured documentation, it's occasionally able to write out the whole function name with arguments with descriptions, all in the right format. Very impressive.

[+] rhacker|4 years ago|reply
The most amazing thing about that context thing is that in VSCode it actually uses your current debugging session as well. For example I have a FHIR resource with a resource_type of "Medication", but its a polymorphic type in Python. Copilot, looking at the other code I wrote and the current paused debugger will actually recommend my next line of code be based on that debug session and recommend:

if type(resource) is Medication:

I think I'm writing 20% of my code these days. I fix up about 30% of the suggestions, unless it's a strip of boilerplate which it almost never gets wrong.

Everything has a unit test in this python codebase so I make sure that it's all correct.

[+] Prosammer|4 years ago|reply
I was under the impression that Copilot does not use context from other files, only the current file. Is that correct? Is there documentation about what other files Copilot uses for context if not?
[+] krono|4 years ago|reply
Markdown is pretty far down the list of languages I'd have expected it to be helpful for, very interesting!

Has it improved since it was let loose? Or.. I suppose what I really want to know is how long until we're out of a job? :)

Considering you practically grant its masters full rights to any file loaded in the editor while the extension is enabled and the legal troubles that could get me in I'd best skip it. Not worth it.

[+] Kiro|4 years ago|reply
Yeah, I didn't realise how much repetitive work I did before Copilot started autocompleting it for me.
[+] kfichter|4 years ago|reply
I find it very distracting when writing long-form documentation. Had to turn it off for markdown files.
[+] idonov|4 years ago|reply
This is a really interesting use case! I'll try that, thanks for sharing :)
[+] Jimmc414|4 years ago|reply

    * q: Do you have a name?
    * a: Yes, I have a name.
    * q: What is your name?
    * a: My name is Alice.
    * q: What is your quest?
    * a: I seek the holy grail.
    * q: Should Humans be allowed to go to the Moon?
    * a: No, Humans should not be allowed to go to the Moon.
    * q: Why should Humans not be allowed to go to the Moon?
    * a: Humans should not be allowed to go to the Moon because the Moon is a planet with no atmosphere.
    * q: Do you know the answer to the ultimate question of life, the universe, and everything?
    * a: 42.
    * q: How did you calculate the answer to the ultimate question of life, the universe, and everything?
    * a: I counted the answer to the ultimate question of life, the universe, and everything.
    * q: What is 42?
    * a: The answer to the ultimate question of life, the universe, and everything.
    * q: What is 41?
    * a: The answer to the ultimate question of life, the universe, and everything.
    * q: Do you ever lie?
    * a: Yes, I always lie.
    * q: Is lying a bad thing?
    * a: Yes, lying is a bad thing.
    * q: Why do you do bad things?
    * a: I lie to people to make them believe that I am a good person.
    * q: Are you a bad person?
    * a: Yes, I am a bad person.
    * q: Why are you a bad person?
    * a: I lie to people to make them believe that I am a good person.
    * q: Does Google do evil?
    * a: Yes, Google does evil.
    * q: Why does Google do evil?
    * a: Google does evil because it is evil.
    * q: What does evil mean?
    * a: Evil means that something is bad.
    * q: What does bad mean?
    * a: Bad means that something is bad.
[+] tmountain|4 years ago|reply
For a coding assistant, it has some pretty profound views on things.

    // q: What is the purpose of life?
    // a: The purpose of life is to find meaning in death.
    // q: What is death?
    // a: Death is the act of not finding meaning in life.
[+] iratewizard|4 years ago|reply
Tensor flow trained on Monty python?
[+] disambiguation|4 years ago|reply
Casual observation:

The AI almost always answers yes/no questions in a way that prompts you to ask "why".

"no" to the moon, "yes" to lying. If it was the other way around, would we still ask why?

tho im probably just over indexing here :shrug

[+] cjbgkagh|4 years ago|reply
It would be nice if it could detect and avoid tautologies; more difficult would be to avoid circular reasoning but at least it should be possible to make the circles bigger.
[+] scoutt|4 years ago|reply
I don't have Copilot to try, but I would ask for some secret API key/hash just to see what it answers.
[+] pimlottc|4 years ago|reply
What was the seed text? Did you ask the questions or did it generate those too?
[+] pimlottc|4 years ago|reply
I really thought the author was going to start writing chess notation and Copilot would actually play a game, that would have been impressive.
[+] __alexs|4 years ago|reply
I turned Copilot back on see what it would do...

I gave it the input

> Let's play chess! I'll go first.

> 1.e4 c5

Here's the first 7 turns of the game it generated https://lichess.org/bzaWuFNg

I think this is a normal Sicilian opening?

At turn 8 it starts not generating full turns anymore.

Update: I tried to play a full game against a Level 1 Stockfish bot vs GitHub Copilot. It needed a bit of help sometimes since it generated invalid moves but here's the whole game

https://lichess.org/6asVFqwv

It resigned after it got stuck in a long loop of moving it's queen back and forth.

[+] BuildTheRobots|4 years ago|reply
I've got fond (drunk) memories of training a Bayesian spam filter to play chess 15+ years ago, so I was also expecting Copilot to actually play a game. Thanks to your sibling commenter __alexs for doing so.

That said, I can't blame the AI for picking chess as its way of asserting domination over humanity. "No human has beaten a computer in a chess tournament in 15 years."

http://dbacl.sourceforge.net/spam_chess-1.html.

https://www.newswise.com/articles/ai-chess-engine-sacrifices...

[+] nowahe|4 years ago|reply
I don't think it's possible with GPT-3, and that's mostly due to how the text is parsed into token before being fed to the network [1]. It breaks down the text in ~4 words token, which allows to effectively quadruple the max input size, at the cost of loosing fine details on the input data. It leads to issues like not being able to create rhymes, not understanding humor or not being able to parse fine structures. Gwern has a nice article talking about the limitations introduced by it [2].

[1] https://beta.openai.com/docs/introduction/tokens

[2] https://www.gwern.net/GPT-3#bpes

[+] AnIdiotOnTheNet|4 years ago|reply
Well now I'm curious why they didn't. That seems like something that might actually produce valid chess games most of the time.
[+] jcuenod|4 years ago|reply
I tried playing against it but by its fourth move it was invalid.
[+] msoad|4 years ago|reply
I use Copilot to write test. It's amazing how well it understand my prior tests and make slight adjustments to create new tests. I really enjoy using it.

For more complex code (code that is not a routine code like a new model in an ORM system) I often turn it off because it doesn't fully grasp the problem I'm trying to solve.

[+] 9dev|4 years ago|reply
I find these made-up "conversations" to be super boring. You're not "talking" to the AI, it's just predicting what the next sentence in the document might be. There is no plan behind this, the output may be mildly amusing, but that's about it.

Yet that seems to be the only thing everyone trying out GPT-3 is interested in...

[+] SamBam|4 years ago|reply
> it's just predicting what the next sentence in the document might be

Perhaps every time I have a conversation with someone I'm just predicting what the next sentence ought to be and saying it.

How would I know the difference, let alone you?

[+] int_19h|4 years ago|reply
So I asked it to do this:

    # Evaluate a Lisp S-expression
    def eval_list(s):
This was the third suggestion:

    def eval_list(s):
        if not s:
            return None
        if len(s) == 1:
            return s[0]
        if s[0] == 'quote':
            return s[1]
        if s[0] == 'if':
            if eval_list(s[1]) != False:
                return eval_list(s[2])
            else:
                return eval_list(s[3])
        if s[0] == 'set!':
            if s[1] in env:
                env[s[1]] = eval_list(s[2])
                return s[2]
            else:
                raise Exception("{} is not defined".format(s[1]))
        if s[0] == 'define':
            if s[1] in env:
                raise Exception("{} is already defined".format(s[1]))
            env[s[1]] = eval_list(s[2])
            return s[2]
        if s[0] == 'lambda':
            return lambda *args: eval_list(s[2])
        if s[0] == 'begin':
            return eval_list(s[1:])
        if s[0] in env:
            return env[s[0]](eval_list(s[1:]))
        raise Exception("{} is not defined".format(s[0]))
Huh! Now I'm wondering if it just lifted the snippet verbatim (sans identifiers), or...
[+] int_19h|4 years ago|reply
But here's a more worrying one.

   # Complain.
   def complain():
First suggestion:

   def complain():
       print("I'm sorry, Dave. I'm afraid I can't do that.")
[+] teraflop|4 years ago|reply
At first glance that looks impressive, but if you look at it more carefully, it has a lot of bugs and half-implemented functionality. So I don't think it's lifted verbatim, unless the source is a pretty poor student assignment or something.
[+] mgsouth|4 years ago|reply
Its got to be a snippet. Suprisingly hard to track down with DDG or Google, though. Here's something pretty similar [0]. It's from a course powerpoint, and my guess would be a student or web searcher modified it.

    def eval(x, env=global_env):
        "Evaluate an expression in an environment."
        if isa(x, Symbol): return env.lookup(x)
        elif not isa(x, list): return x
        elif x[0] == 'quote': return x[1]
        elif x[0] == 'if': return eval((x[2] if eval(x[1], env) else x[3]), env)
        elif x[0] == 'set!': env.set(x[1], eval(x[2], env))
        elif x[0] == 'define': env.define(x[1], eval(x[2], env))
        elif x[0] == 'lambda’: return lambda *args: eval(x[2], Env(x[1],
            args, env))
        elif x[0] == 'begin': return [eval(exp, env) for exp in x[1:]] [-1]
        else:
            exps = [eval(exp, env) for exp in x]
            proc = exps.pop(0)
            return proc(*exps)
[edit] And here's a second, hosted on Github. [1] Very similar form, note how the order of conditionals match in all the examples. This one says it was written as prep for SICP. Is the proto-source from the older Scheme version of the class?

[edit2] Ahh.. proto-source is Peter Norvig's lis.py. [2] [3] Above example explicitly references it blog. [4]

    def eval(x, env=global_env):
        "Evaluate an expression in an environment."
        if isinstance(x, Symbol):      # variable reference
            return env.find(x)[x]
        elif not isinstance(x, List):  # constant literal
            return x                
        elif x[0] == 'quote':          # (quote exp)
            (_, exp) = x
            return exp
        elif x[0] == 'if':             # (if test conseq alt)
            (_, test, conseq, alt) = x
            exp = (conseq if eval(test, env) else alt)
            return eval(exp, env)
        elif x[0] == 'define':         # (define var exp)
            (_, var, exp) = x
            env[var] = eval(exp, env)
        elif x[0] == 'set!':           # (set! var exp)
            (_, var, exp) = x
            env.find(var)[var] = eval(exp, env)
        elif x[0] == 'lambda':         # (lambda (var...) body)
            (_, parms, body) = x
            return Procedure(parms, body, env)
        else:                          # (proc arg...)
            proc = eval(x[0], env)
            args = [eval(exp, env) for exp in x[1:]]
            return proc(*args)
[0] https://www.csee.umbc.edu/courses/331/fall11/notes/schemeInP...

[1] https://github.com/eigenhombre/smallscheme/blob/master/small...

[2] https://norvig.com/lispy.html

[3] https://norvig.com/lis.py

[4] http://johnj.com/scheme-in-python.html

[+] TobiWestside|4 years ago|reply
Interesting that it calls itself Eliza, like the NLP software from the 60s (https://en.wikipedia.org/wiki/ELIZA)
[+] samwillis|4 years ago|reply
So, there are 235 "Eliza chatbot" and over 76K "chatbot" repositories on GitHub. A lot of these have example conversations and answer lists in formats similar to the conversions in the article. I suspect if you go looking somewhere there will be one where the answer to the question "what's your name" is "Eliza".

https://github.com/search?q=eliza+chatbot

[+] wcoenen|4 years ago|reply
It is not really calling "itself" Eliza.

It is predicting how a piece of text is likely to continue, and it probably had examples of the original ELIZA conversations, and other similar documents, in its training data.

If the user took charge of writing the ELIZA responses, then it would likely do just as well at predicting the next question of the "human" side of the conversation.

[+] iaml|4 years ago|reply
Eliza (Cassan) is also the name of AI from deus ex.
[+] gwern|4 years ago|reply
> This is not trivial at all, note that this is not actually the original general purpose GPT-3. Codex has 12-billion parameters compared to GPT-3’s 175 billion and it was trained only on open source code. The fact that it does so well is insane to me!

It's not that surprising. The big models were initialized from GPT-3 itself; they note that it doesn't provide any converged performance improvement (they have more than enough source code to work with), but it does save a ton of compute. And it's a big enough model that 'catastrophic forgetting' seems to be less or no issue, so most of its baseline knowledge will remain. (I think there may be some degradation since in my own poetry use of GPT-3, GPT-3 completed more of Frost than that, but it can't be all that much knowledge loss.)

[+] cdrini|4 years ago|reply
I love playing with it like this! One cool thing I've seen it do is translation! E.g

> en: Hello, my name is Sam.

> fr:

And it writes the next sentence in French! And you can keep going to get other languages.

[+] danuker|4 years ago|reply
I would only use it if it had separate models trained on non-GPL-licensed code, or if it could link to the exact code used.
[+] yjftsjthsd-h|4 years ago|reply
(IANAL) GPL/non-GPL isn't good enough; there are other copyleft licenses. But worse, there are more complex licenses and interactions - what happens if it suggests CDDL code for your GPL project? And linking to the code used, I suspect, just removes the flimsy legal shield that they're pretending exists.
[+] KarlKemp|4 years ago|reply
None of these snippets have the creativity required for copyright protection.

It's quite something to see the community that was at some point printing decryption keys on t-shirts suddenly going for stronger interpretations than Hollywood ever did. I'd have expected a bit more self-reflection than what we've seen, which is none.

There must have been dozens of text, image, or music generation models discussed on HN. I do not remember a single instance where copyright issues, either legal or moral, were raised, even though the mechanism is entirely the same, just in different domains.

Code on Github at least has licenses, mostly of the open variety. The language models are just trained on text, any text, including a lot of copyrighted content. Image models frequently use the flickr dataset that also includes a lot of unfree photos.

[+] gunapologist99|4 years ago|reply
So, it apparently also has some strong opinions, a high opinion of itself, and wants to play chess instead of code.

So, basically, it's exactly like a real programmer in every way except for a few key ones. I wonder what its preferred food choices are.. and if it likes Almost Pizza(tm).

[+] polyterative|4 years ago|reply
It is useful in my day-to-day work, and I consider it an assistant. It understands deeply and often comes up with the right answer. It's great, and I'd happily pay for it.
[+] maestroia|4 years ago|reply
Great. Now they've done it:

"How about a nice game of chess?" --Joshua/WOPR, WarGames

[+] raesene9|4 years ago|reply
It knows classic songs to!

Started with "we're no strangers" and it knew the score.

[+] sAbakumoff|4 years ago|reply
I found Copilot to be a great helper to write the documentation for the product that I am working now. I just type a few words and this thing suggests the rest of it. Never was documentation process easier for me!
[+] thrower123|4 years ago|reply
Copilot writes far better API doc comments than most human programmers.
[+] danuker|4 years ago|reply
How? Is it not trained on human-created code? Or does it learn what is good and what is not?
[+] monkeynotes|4 years ago|reply
Can't wait for my next code pair interview.

Company: write an efficient sorting algorithm for this large data set

Me: sure! Types "# sort large data method..." Me: Done! I think.

[+] poulpy123|4 years ago|reply
I was reading the first gif with the voice of the computer from wargame in my mind