top | item 47050837

(no title)

dpe82 | 12 days ago

It's wild that Sonnet 4.6 is roughly as capable as Opus 4.5 - at least according to Anthropic's benchmarks. It will be interesting to see if that's the case in real, practical, everyday use. The speed at which this stuff is improving is really remarkable; it feels like the breakneck pace of compute performance improvements of the 1990s.

discuss

madihaa|12 days ago

The most exciting part isn't necessarily the ceiling raising though that's happening, but the floor rising while costs plummet. Getting Opus-level reasoning at Sonnet prices/latency is what actually unlocks agentic workflows. We are effectively getting the same intelligence unit for half the compute every 6-9 months.

scottmf|11 days ago

2024: Intelligence too cheap to meter

2026: Everyone is spending $500/month on LLM subscriptions

mooreds|12 days ago

> We are effectively getting the same intelligence unit for half the compute every 6-9 months.

Something something ... Altman's law? Amodei's law?

Needs a name.

turnsout|12 days ago

This is what excited me about Sonnet 4.6. I've been running Opus 4.6, and switched over to Sonnet 4.6 today to see if I could notice a difference. So far, I can't detect much if any difference, but it doesn't hit my usage quota as hard.

nimonian|12 days ago

Moore's law lives on!

amelius|12 days ago

> The speed at which this stuff is improving is really remarkable; it feels like the breakneck pace of compute performance improvements of the 1990s.

Yeah, but RAM prices are also back to 1990s levels.

mrcwinn|12 days ago

Relief for you is available: https://computeradsfromthepast.substack.com/p/connectix-ram-...

mikkupikku|12 days ago

I knew I've been keeping all my old ram sticks for a reason!

dpe82|12 days ago

simonw hasn't shown up yet, so here's my "Generate an SVG of a pelican riding a bicycle"

https://claude.ai/public/artifacts/67c13d9a-3d63-4598-88d0-5...

coffeebeqn|12 days ago

We finally have AI safety solved! Look at that helmet

thinkling|12 days ago

For comparisonI think the current leader in pelican drawing is Gemini 3 Deep Think:

https://bsky.app/profile/simonwillison.net/post/3meolxx5s722...

AstroBen|12 days ago

if they want to prove the model's performance the bike clearly needs aero bars

dyauspitr|12 days ago

Can’t beat Gemini’s which was basically perfect.

satvikpendem|12 days ago

> Sonnet 4.6 is roughly as capable as Opus 4.5 - at least according to Anthropic's benchmarks

Yeah it's really not. Sonnet still struggles while Opus, even 4.5 succeeds (and some examples show Opus 4.6 is actually even worse than 4.5, all while being more expensive and taking longer to finish).

justinhj|12 days ago

We see the same with Google's Flash models. It's easier to make a small capable model when you have a large model to start from.

karmasimida|12 days ago

Flash models are nowhere near Pro models in daily use. Much higher hallucinations, and easy to get into a death sprawl of failed tool uses and never come out

You should always take those claim that smaller models are as capable as larger models with a grain of salt.

simlevesque|12 days ago

The system card even says that Sonnet 4.6 is better than Opus 4.6 in some cases: Office tasks and financial analysis.

iLoveOncall|12 days ago

Given that users prefered it to Sonnet 4.5 "only" in 70% of the cases (according to their blog post) makes me highly doubt that this is representative of real-life usage. Benchmarks are just completely meaningless.

jwolfe|12 days ago

For cases where 4.5 already met the bar, I would expect 50% preference each way. This makes it kind of hard to make any sense of that number, without a bunch more details.

ge96|12 days ago

I sent Opus a photo of NYC at night satellite view and it was describing "blue skies and cliffs/shore line"... mistral did it better, specific use case but yeah. OpenAI was just like "you can't submit a photo by URL". Was going to try Gemini but kept bringing up vertexai. This is with Langchain

danielbln|11 days ago

I just sent Opus a NYC night satellite view and it described it just as expected. Seems like you have a tooling problem, not a model problem.

estomagordo|12 days ago

Why is it wild that a LLM is as capable as a previously released LLM?

crummy|12 days ago

Opus is supposed to be the expensive-but-quality one, while Sonnet is the cheaper one.

So if you don't want to pay the significant premium for Opus, it seems like you can just wait a few weeks till Sonnet catches up

tempestn|12 days ago

Because Opus 4.5 was released like a month ago and state of the art, and now the significantly faster and cheaper version is already comparable.

simianwords|12 days ago

It means price has decreased by 3 times in a few months.

Retr0id|12 days ago

Because Opus 4.5 inference is/was more expensive.