(no title)
cantor_S_drug | 3 months ago
In LLMs, we will have bigger weights vs test-time compute tradeoffs. A smaller model can get "there" but it will take longer.
cantor_S_drug | 3 months ago
In LLMs, we will have bigger weights vs test-time compute tradeoffs. A smaller model can get "there" but it will take longer.
refulgentis|3 months ago
I wish this was true.
It isn't.
"In algorithms, we have space vs time tradeoffs, therefore a small LLM can get there with more time" is the same sort of "not even wrong" we all smile about us HNers doing when we try applying SWE-thought to subjects that aren't CS.
What you're suggesting amounts to "monkeys on typewriters will write entire works of Shakespeare eventually" - neither in practice, nor in theory, is this a technical claim, or something observable, or even stood up as a one-off misleading demo once.
cantor_S_drug|3 months ago
To answer you directly, a smaller SOTA reasoning model with a table of facts can rederive relationships given more time than a bigger model which encoded those relationships implicitly.
Aurornis|3 months ago
Assuming both are SOTA, a smaller model can't produce the same results as a larger model by giving it infinite time. Larger models inherently have more room for training more information into the model.
No amount of test-retry cycle can overcome all of those limits. The smaller models will just go in circles.
I even get the larger hosted models stuck chasing their own tail and going in circles all the time.
yorwba|3 months ago
And you don't necessarily need to train all information into the model, you can also use tool calls to inject it into the context. A small model that can make lots of tool calls and process the resulting large context could obtain the same answer that a larger model would pull directly out of its weights.
naasking|3 months ago
That's speculative at this point. In the context of agents with external memory, this isn't so clear.
woctordho|3 months ago
unknown|3 months ago
[deleted]
lossolo|3 months ago
HarHarVeryFunny|3 months ago
There is obviously also some amount (maybe a lot) of core knowledge and capability needed even to be able to ask the right questions and utilize the answers.
nkmnz|3 months ago
homarp|3 months ago
andai|3 months ago