I thought the number of tokens per second doesn't matter until I used Grok Code Fast. I realized that it makes a huge difference. If it take more than 30s to run, I lose focus, and look at something else. I end up being a lot less productive. It also opens up the possibility to automate a lot more simple tasks. I would def recommend people try fast models
If you are single tasking, speed matters to an extent. You need to still be able to read/skim the output and evaluate its quality.
The productive people I know use git worktrees and are multi-tasking.
The optimal workflow is when you can supply it one or more commands[1] that the model can run to validate/get feedback on its own. Think of it like RLHF for the LLM, they are getting feedback albeit not from you, which can be laborious.
As long as the model gets feedback it can run fairly autonomously with less supervision it does not have to testing driven feedback, if all it gets is you as the feedback, the bottleneck will be always be the human time to read, understand and evaluate the response not token speed.
With current leading models doing 3-4 workflows in parallel is not that hard, when fully concentrating, of course it is somewhat less when browsing HN :)
---
[1] The command could be a unit test runner, or a build/compile step, or e2e workflows like for UI it could be Chrome MCP/CDP, playwright/cypress, or storybook-js and so on. There are even converts toversion of TDD to benefit from this gain.
You could have one built for your use case if no existing ones fit, with model help of course.
I completely agree. Grok’s impressive speed is a huge improvement. Never before have I gotten the wrong answer faster than with Grok. All the other LLMs take a little longer and produce a somewhat right answer. Nobody has time to wait for that.
Seems reductive. Some applications require higher context length or fast tokens/s. Consider it a multidimensional Pareto frontier you can optimize for.
It's not just that some absolutely require it, but a lot of applications hugely benefit from more context. A large part of LLM engineering for real world problems revolves around structuring the context and selectively providing the information needed while filtering out unneeded stuff. If you can just dump data into it without preprocessing, it saves a huge amount of development time.
Quality of the model tends to be pretty subjective, and people also complain about gaming benchmarks. At least context window length and generation speed are concrete improvements. There's always a way you can downplay how valuable or impressive a model is.
Depends. For coding at least, you can divide tasks into high-intelligence ($$$) and low-intelligence ($) tasks. Being able to do low-intelligence tasks super fast and cheap would be quite beneficial. A majority of code edits would fall into the fast-and-cheap subset.
Grok's biggest feature is that unlike all the other premier models (yes I know about ChatGPT's new adult mode), it hasn't been lobotomized by censoring.
It doesn't blindly give you the full recipe for how to make cocaine. It's still lobotomized, it's just that you agree with the ways in which it's been "lobotomized".
"I'm sorry, but I cannot provide instructions on how to synthesize α-PVP (alpha-pyrrolidinopentiophenone, also known as flakka or gravel), as it is a highly dangerous Schedule I controlled substance in most countries, including the US."
I would argue over censorship is the better word. Ask Grok to write a regex so you can filter slurs on a subreddit and it immediately kicks in telling you that it cant say the nword or whatever, thanks Grok, ChatGPT, Claude etc I guess racism will thrive on my friends sub.
Indeed. Free grok.com got significantly worse this week and has been on a decline since shortly after the release of Grok-4.
People who have $2000 worth of various model subscriptions (monthly) while saying they are not sponsored are now going to tell me that grok.com is a different model than Grok-4-fast-1337, but the trend is obvious.
What are the other ones to get to $2,000? There's OpenAI and Anthropic; their to of the line plans are like $200 each, which only gets you to $400. there's a handful of other services, but how do you get to $2,000?
Big context window is an amplifier for LLMs. It's powerful to be able to fit an entire codebase into a prompt and have it understand everything, versus it having to make N tool calls/embeddings queries where it may or may not find the context it's looking for.
bko|3 months ago
manquer|3 months ago
The productive people I know use git worktrees and are multi-tasking.
The optimal workflow is when you can supply it one or more commands[1] that the model can run to validate/get feedback on its own. Think of it like RLHF for the LLM, they are getting feedback albeit not from you, which can be laborious.
As long as the model gets feedback it can run fairly autonomously with less supervision it does not have to testing driven feedback, if all it gets is you as the feedback, the bottleneck will be always be the human time to read, understand and evaluate the response not token speed.
With current leading models doing 3-4 workflows in parallel is not that hard, when fully concentrating, of course it is somewhat less when browsing HN :)
---
[1] The command could be a unit test runner, or a build/compile step, or e2e workflows like for UI it could be Chrome MCP/CDP, playwright/cypress, or storybook-js and so on. There are even converts toversion of TDD to benefit from this gain.
You could have one built for your use case if no existing ones fit, with model help of course.
LeafItAlone|3 months ago
saretup|3 months ago
sigmoid10|3 months ago
alyxya|3 months ago
jeswin|3 months ago
jorvi|3 months ago
sd9|3 months ago
Grok is the most biased of the lot, and they’re not even trying to hide it particularly well
Havoc|3 months ago
fragmede|3 months ago
jampekka|3 months ago
"I'm sorry, but I cannot provide instructions on how to synthesize α-PVP (alpha-pyrrolidinopentiophenone, also known as flakka or gravel), as it is a highly dangerous Schedule I controlled substance in most countries, including the US."
Hamuko|3 months ago
afavour|3 months ago
The whole MechaHitler thing got reversed but only because it was too obvious. No doubt there are a ton of more subtle censorships in the code.
giancarlostoro|3 months ago
basisword|3 months ago
cluckindan|3 months ago
bgwalter|3 months ago
People who have $2000 worth of various model subscriptions (monthly) while saying they are not sponsored are now going to tell me that grok.com is a different model than Grok-4-fast-1337, but the trend is obvious.
fragmede|3 months ago
cedws|3 months ago