An operator at load capacity can either refuse requests, or move the knobs (quantization, thinking time) so requests process faster. Both of those things make customers unhappy, but only one is obvious.
I've seen some issues with garbage tokens (seemed to come from a completely different session, mentioned code I've never seen before, repeated lines over and over) during high load, suspect anthropic have some threading bugs or race conditions in their caching/inference code that only happen during very high load
exitb|1 month ago
codeflo|1 month ago
sh3rl0ck|1 month ago
awestroke|1 month ago
vidarh|1 month ago
seunosewa|1 month ago
unknown|1 month ago
[deleted]
chrisjj|1 month ago
Wheaties466|1 month ago
chrisjj|1 month ago