top | item 46811855

(no title)

megabless123 | 1 month ago

noob question: why would increased demand result in decreased intelligence?

discuss

order

exitb|1 month ago

An operator at load capacity can either refuse requests, or move the knobs (quantization, thinking time) so requests process faster. Both of those things make customers unhappy, but only one is obvious.

codeflo|1 month ago

This is intentional? I think delivering lower quality than what was advertised and benchmarked is borderline fraud, but YMMV.

sh3rl0ck|1 month ago

I'd wager that lower tok/s vs lower quality of output would be two very different knobs to turn.

awestroke|1 month ago

I've seen some issues with garbage tokens (seemed to come from a completely different session, mentioned code I've never seen before, repeated lines over and over) during high load, suspect anthropic have some threading bugs or race conditions in their caching/inference code that only happen during very high load

vidarh|1 month ago

It would happen if they quietly decide to serve up more aggressively distilled / quantised / smaller models when under load.

seunosewa|1 month ago

Or just reducing the reasoning tokens.

chrisjj|1 month ago

They advertise the Opus 4.5 model. Secretly substituting a cheaper one to save costs would be fraud.

Wheaties466|1 month ago

from what I understand this can come from the batching of requests.

chrisjj|1 month ago

So, a known bug?