top | item 45863015

(no title)

mehdibl | 3 months ago

What matter is not context or the recod token/s you get.

But the quality for the model. And it seem Grok pushing the wrong metrics again, after launching fast.

discuss

order

bko|3 months ago

I thought the number of tokens per second doesn't matter until I used Grok Code Fast. I realized that it makes a huge difference. If it take more than 30s to run, I lose focus, and look at something else. I end up being a lot less productive. It also opens up the possibility to automate a lot more simple tasks. I would def recommend people try fast models

manquer|3 months ago

If you are single tasking, speed matters to an extent. You need to still be able to read/skim the output and evaluate its quality.

The productive people I know use git worktrees and are multi-tasking.

The optimal workflow is when you can supply it one or more commands[1] that the model can run to validate/get feedback on its own. Think of it like RLHF for the LLM, they are getting feedback albeit not from you, which can be laborious.

As long as the model gets feedback it can run fairly autonomously with less supervision it does not have to testing driven feedback, if all it gets is you as the feedback, the bottleneck will be always be the human time to read, understand and evaluate the response not token speed.

With current leading models doing 3-4 workflows in parallel is not that hard, when fully concentrating, of course it is somewhat less when browsing HN :)

---

[1] The command could be a unit test runner, or a build/compile step, or e2e workflows like for UI it could be Chrome MCP/CDP, playwright/cypress, or storybook-js and so on. There are even converts toversion of TDD to benefit from this gain.

You could have one built for your use case if no existing ones fit, with model help of course.

LeafItAlone|3 months ago

I completely agree. Grok’s impressive speed is a huge improvement. Never before have I gotten the wrong answer faster than with Grok. All the other LLMs take a little longer and produce a somewhat right answer. Nobody has time to wait for that.

saretup|3 months ago

Seems reductive. Some applications require higher context length or fast tokens/s. Consider it a multidimensional Pareto frontier you can optimize for.

sigmoid10|3 months ago

It's not just that some absolutely require it, but a lot of applications hugely benefit from more context. A large part of LLM engineering for real world problems revolves around structuring the context and selectively providing the information needed while filtering out unneeded stuff. If you can just dump data into it without preprocessing, it saves a huge amount of development time.

alyxya|3 months ago

Quality of the model tends to be pretty subjective, and people also complain about gaming benchmarks. At least context window length and generation speed are concrete improvements. There's always a way you can downplay how valuable or impressive a model is.

jeswin|3 months ago

Depends. For coding at least, you can divide tasks into high-intelligence ($$$) and low-intelligence ($) tasks. Being able to do low-intelligence tasks super fast and cheap would be quite beneficial. A majority of code edits would fall into the fast-and-cheap subset.

jorvi|3 months ago

Grok's biggest feature is that unlike all the other premier models (yes I know about ChatGPT's new adult mode), it hasn't been lobotomized by censoring.

sd9|3 months ago

I am amazed people actually believe this

Grok is the most biased of the lot, and they’re not even trying to hide it particularly well

Havoc|3 months ago

No censoring and it says the things I agree with are not the same thing

fragmede|3 months ago

It doesn't blindly give you the full recipe for how to make cocaine. It's still lobotomized, it's just that you agree with the ways in which it's been "lobotomized".

jampekka|3 months ago

Grok has plenty of censoring. E.g.

"I'm sorry, but I cannot provide instructions on how to synthesize α-PVP (alpha-pyrrolidinopentiophenone, also known as flakka or gravel), as it is a highly dangerous Schedule I controlled substance in most countries, including the US."

Hamuko|3 months ago

Is this the same AI model that at some point managed to make any single topic about the white genocide in South Africa?

afavour|3 months ago

Of course it has. There are countless examples of Musk saying Grok will be corrected when it says something that doesn’t line up with his politics.

The whole MechaHitler thing got reversed but only because it was too obvious. No doubt there are a ton of more subtle censorships in the code.

giancarlostoro|3 months ago

I would argue over censorship is the better word. Ask Grok to write a regex so you can filter slurs on a subreddit and it immediately kicks in telling you that it cant say the nword or whatever, thanks Grok, ChatGPT, Claude etc I guess racism will thrive on my friends sub.

basisword|3 months ago

I’ve never run into this problem. What are you asking LLM’s where you run it censoring you?

cluckindan|3 months ago

Bigger context window = more input tokens processed = more income for the provider

bgwalter|3 months ago

Indeed. Free grok.com got significantly worse this week and has been on a decline since shortly after the release of Grok-4.

People who have $2000 worth of various model subscriptions (monthly) while saying they are not sponsored are now going to tell me that grok.com is a different model than Grok-4-fast-1337, but the trend is obvious.

fragmede|3 months ago

What are the other ones to get to $2,000? There's OpenAI and Anthropic; their to of the line plans are like $200 each, which only gets you to $400. there's a handful of other services, but how do you get to $2,000?

cedws|3 months ago

Big context window is an amplifier for LLMs. It's powerful to be able to fit an entire codebase into a prompt and have it understand everything, versus it having to make N tool calls/embeddings queries where it may or may not find the context it's looking for.