top | item 46953554

(no title)

kasey_junk | 20 days ago

In testing for my workflows copilot significantly underperforms the SOTA agents, even when using the exact same models. It's not particularly close either.

This has lead to 2 classes of devs at my company a) AI hesitant, who for many copilot is their only interaction, having their worst fears confirmed about how bad AI is. b) AI enthusiasts who are irritated by dealing with management that don't know the difference pushing back on their asks for access to SOTA agents.

If I were the frontier labs, and wasn't billions of dollars beholden to Microsoft, I'd cut Copilot off. It poisons the well for adoption of their other systems. I don't deal with the other copilots besides the coding agent variants but I hear similar things about the business application variants.

Microsofts AI reputation is in the toilet right now, I'm not sure if its understood how bad it really is within the org.

discuss

order

nfg|20 days ago

Interesting - these head to head comparisons you’re doing with the same model - what harnesses are you comparing, say Claude code / codex versus copilot cli?

> I'm not sure if its understood how bad it really is within the org.

I can’t speak to that, but there’s a lively culture of people using internal tooling who also extensively use 3p products on projects outside work and are in a reasonable position to assess how well GH copilot works.

kasey_junk|20 days ago

Yeah, I’m only interested in cli and non-interactive agent usage. I don’t compare say the vs code plugins, but do regularly compare say GitHub code reviews.

Those comparisons for instance have made us turn _off_ copilot pull requests entirely. All of the agents have false positives (as do humans) but copilot was having negative value in that context.