top | item 45429212

(no title)

Shakahs | 5 months ago

Sonnet/Claude Code may technically be "smarter", but Qwen3-Coder on Cerebras is often more productive for me because it's just so incredibly fast. Even if it takes more LLM calls to complete a task, those calls are all happening in a fraction of the time.

discuss

order

nerpderp82|5 months ago

We must have very different workflows, I am curious about yours. What tools are you using and how are you guiding Qwen3-Coder? When I am using Claude Code, it often works for 10+ minutes at a time, so I am not aware of inference speed.

solarkraft|5 months ago

You must write very elaborate prompts for 10 minutes to be worth the wait. What permissions are you giving it and how much do you care about the generated code? How much time did you spend on initial setup?

I‘ve found that the best way for myself to do LLM assisted coding at this point in time is in a somewhat tight feedback loop. I find myself wanting to refine the code and architectural approaches a fair amount as I see them coming in and latency matters a lot to me here.

CaptainOfCoit|5 months ago

> When I am using Claude Code, it often works for 10+ minutes at a time, so I am not aware of inference speed.

Indirectly, it sounds like you're aware about the inference speed? Imagine if it took 2 minutes instead of 10 minutes, that's what the parent means.

sdesol|5 months ago

> Sonnet/Claude Code may technically be "smarter", but Qwen3-Coder on Cerebras is often more productive for me because it's just so incredibly fast.

Saying "technically" is really underselling the difference in intelligence in my opinion. Claude and Gemini are much, much smarter and I trust them to produce better code, but you honestly can't deny the excellent value that Qwen-3, the inference speed and $50/month for 25M tokens/per day brings to the table.

Since I paid for the Cerebras pro plan, I've decided to force myself to use it as much as possible for the duration of the month for developing my chat app (https://github.com/gitsense/chat) and here so some of my thoughts so far:

- Qwen3 Coder is a lot dumber when it comes to prompting as Gemini and Claude are much better at reading between the lines. However since the speed is so good, I often don't care as I can go back to the message and make some simple clarifications and try again.

- The max context window size of 128k for Qwen 3 Coder 480B on their platform can be a serious issue if you need a lot of documentation or code in context.

- I've never come close to the 25M tokens per day limit for their Pro Plan. The max I am using is 5M/day.

- The inference speed + a capable model like Qwen 3 will open up use cases most people might not have thought of before.

I will probably continue to pay for the $50 dollar plan for these use cases.

1. Applying LLM generated patches

Qwen 3 coder is very much capable of applying patches generated by Sonnet and Gemini. It is slower than what https://www.morphllm.com/ provides but it is definitely fast enough for most people to not care. The cost savings can be quite significant depending on the work.

2. Building context

Since it is so fast and because the 25M token limit per day is such a high limit for me, I am finding myself loading more files into context and just asking Qwen to identify files that I will need and/or summarize things so I can feed it into Sonnet or Gemini to save me significant money.

3. AI Assistant

Due to it's blazing speed, you can analyze a lot data fast for deterministic searches and because it can review results at such a great speed, you can do multiple search and review loops without feeling like you are waiting forever.

Given what I've experienced so far, I don't think Cerebras can be a serious platform for coding if Qwen 3 Coder is the only available model. Having said that, given the inference speed and Qwen being more than capable, I can see Cerebras becoming a massive cost savings option for many companies and developers, which is where I think they might win a lot of enterprise contracts.