top | item 44838861

(no title)

chromejs10 | 6 months ago

This should have been compared with Opus... I know OP says he didn't because of cost but if you're comparing who is better then you need to compare the best to the best... if Claude Opus 4.1 is significantly better than GPT 5 then that could offset the extra expense. Not saying it will... but forget cost if we want to compare solely the quality

discuss

nearbuy|6 months ago

For what it's worth, I've been trying Opus 4.1 in VS Code through GitHub Copilot and it's been really bad. Maybe worse than Sonnet and GPT 4.1. I'm not sure why it was doing so poorly.

In one instance, I asked it to optimize a roughly 80 line C# method that matches some object positions by object ID and delta encodes their positions from the previous frame. It seemed to be confused about how all this should work and output completely wrong code. It has all the context it needs in the file and the method is fairly self-contained. Other models did much better. GPT-5 understood what to do immediately.

I tried a few other tasks/questions that also had underwhelming results. Now I've switched to using GPT-5.

If you have a quick prompt you'd like me to try, I can share the results.

cpursley|6 months ago

Use Claude Code, the rest aren't worth the bother.

bongodongobob|6 months ago

To me it seems that Opus is really good at writing code if you give it a spec. The other day I had Gpt come up with a spec for a DnD text game that uses the GPT API. It one shotted a 1k line program.

However, if I'm not detailed with it, it does seem to make weird choices that end up being unmaintainable. It's like it has poor creative instincts but is really good at following the directions you give it.

muzani|6 months ago

Opus seems to need more babysitting IME, which is great if you are going to actually pair program. Terrible if you like leaving it to do its own thing or try to do multiple things at once.

intellectronica|6 months ago

Opus costs 10X more. Maybe it's better, but I can't afford to use it, so who cares.

runako|6 months ago

re: the comments that Opus is not cost effective...The whole sales pitch behind these tools, and quite specifically the pitch OpenAI made yesterday, is that they will replace people, specifically programmers. Opus is cheaper than a US-based engineer. It's totally reasonable to use it as the benchmark if it's best.

Also keep in mind that many employees are not paying out of pocket for LLM use at work. A $1,000 monthly bill for LLM usage is high for an individual but not so much for a company that employees engineers.

michaelt|6 months ago

My experience with coding agents is they need a lot of hand-holding.

They're impressive despite that. But if Sonnet is $20/month and I have to intervene every 3 minutes, while Opus is $100/month and I have to intervene every 5 minutes? ¯\_(ツ)_/¯

qeternity|6 months ago

> but forget cost if we want to compare solely the quality

I think this is the whole reason not to compare it to Opus...

bgirard|6 months ago

I agree. Opus is cost prohibitive for most longer coding tasks. The increase output doesn't justify the cost.

sergiotapia|6 months ago

You compare what can be used by most engineers. Most engineers are not going to spend that insane price of Opus. It's extremely high compared to all other models, so even if it is slightly better, it's a non-starter for engineering workloads.

andsoitis|6 months ago

> t insane price of Opus

I believe Opus starts at $20 a month, similar to GPT5 if you want more than just cursory usage.

Or am I missing something?

markbao|6 months ago

Most engineers spending their own money maybe, but the cost of Opus is not that much compared to the output when the company is paying for it.

unknown|6 months ago

[deleted]

fouc|6 months ago

gpt-5 isn't supposed to be the best, it's supposed to be cost effective

senko|6 months ago

From OpenAI website:

> Our smartest, fastest, and most useful model yet

I'd say it's definitely supposed to be the best, it just doesn't deliver.