(no title)
bhuga | 4 years ago
That is to say, you have code prompts here, let Copilot fill in the gaps, and rate that code. Is there a study that uses the same prompts with a selection of programmers to see if they do better or worse?
I'm curious because in my testing of copilot, it often writes garbage. But if I'm being honest, often, so do I.
I feel like Twitter's full of cheap shots against copilot's bad outputs, but many of them don't seem to be any worse than common errors. I would really like to see how copilot stands up to the existing human competition, especially on axes of security, which are a bit more objectively measurable than general "quality".
kiwih|4 years ago
Nonetheless, we think that simply having a quantification of Copilot's outputs is useful, as it can definitely provide an indicator of how risky it might be to provide the tool to an inexperienced developer that might be tempted to accept every suggestion.
laumars|4 years ago