(no title)
nguyentran03 | 14 days ago
1,000 tok/s sounds impressive but Cerebras has already done 3,000 tok/s on smaller models. so either Codex-Spark is significantly larger/heavier than gpt-oss-120B, or there's overhead from whatever coding-specific architecture they're using. the article doesn't say which.
the part I wish they'd covered: does speed actually help code quality, or just help you generate wrong code faster? with coding agents the bottleneck isn't usually token generation, it's the model getting stuck in loops or making bad architectural decisions. faster inference just means you hit those walls sooner.
aurareturn|14 days ago
irishcoffee|14 days ago
A different way to read this might be: Nvidia isn't going to agree to that deal, so we now need to save face by dumping them first"
I imagine sama doesn't like rejection.
conception|14 days ago
nerdsniper|14 days ago