top | item 44768368

(no title)

rbitar | 7 months ago

This token throughput is incredible and going to set a new bar in the industry. The main issue with the cerebras code plan is that number of requests/minute is throttled, and with agentic coding systems each tool call is treated as new "message" so you can easily hit the api limits (10 messages/minute).

One workaround we're doing now that seems to work is use claude for all tasks but delegate specific tools with cerebras/qwen-3-coder-480b model to generate files or other token heavy tasks to avoid spiking the total number of requests. This has cost and latency consequences (and adds complexity to the code), but until those throttle limits are lifted seems to be a good combo. I also find that claude has better quality with tool selection when the number of tools required is > 15 which our current setup has.

discuss

unknown|7 months ago

[deleted]