top | item 45838560

(no title)

cantor_S_drug | 3 months ago

In CS algorithms, we have space vs time tradeoffs.

In LLMs, we will have bigger weights vs test-time compute tradeoffs. A smaller model can get "there" but it will take longer.

discuss

order

refulgentis|3 months ago

I have spent the last 2.5 years living like a monk to maintain an app across all paid LLM providers and llama.cpp.

I wish this was true.

It isn't.

"In algorithms, we have space vs time tradeoffs, therefore a small LLM can get there with more time" is the same sort of "not even wrong" we all smile about us HNers doing when we try applying SWE-thought to subjects that aren't CS.

What you're suggesting amounts to "monkeys on typewriters will write entire works of Shakespeare eventually" - neither in practice, nor in theory, is this a technical claim, or something observable, or even stood up as a one-off misleading demo once.

cantor_S_drug|3 months ago

If "not even wrong" is more wrong than wrong, then is 'not even right" more right than right.

To answer you directly, a smaller SOTA reasoning model with a table of facts can rederive relationships given more time than a bigger model which encoded those relationships implicitly.

Aurornis|3 months ago

> In LLMs, we will have bigger weights vs test-time compute tradeoffs. A smaller model can get "there" but it will take longer.

Assuming both are SOTA, a smaller model can't produce the same results as a larger model by giving it infinite time. Larger models inherently have more room for training more information into the model.

No amount of test-retry cycle can overcome all of those limits. The smaller models will just go in circles.

I even get the larger hosted models stuck chasing their own tail and going in circles all the time.

yorwba|3 months ago

It's true that to train more information into the model you need more trainable parameters, but when people ask for small models, they usually mean models that run at acceptable speeds on their hardware. Techniques like mixture-of-experts allow increasing the number of trainable parameters without requiring more FLOPs, so they're large in one sense but small in another.

And you don't necessarily need to train all information into the model, you can also use tool calls to inject it into the context. A small model that can make lots of tool calls and process the resulting large context could obtain the same answer that a larger model would pull directly out of its weights.

naasking|3 months ago

> No amount of test-retry cycle can overcome all of those limits. The smaller models will just go in circles.

That's speculative at this point. In the context of agents with external memory, this isn't so clear.

woctordho|3 months ago

Almost all training data are on the internet. As long as the small model has enough agentic browsing ability, given it enough time it will retrieve the data from the internet.

lossolo|3 months ago

This doesn't work like that. An analogy would be giving a 5 year old a task that requires the understanding of the world of an 18 year old. It doesn't matter whether you give that child 5 minutes or 10 hours, they won't be capable of solving it.

HarHarVeryFunny|3 months ago

I think the question of what can be achieved with a small model comes down to what needs knowledge vs what needs experience. A small model can use tools like RAG if it is just missing knowledge, but it seems hard to avoid training/parameters where experience is needed - knowing how to perceive then act.

There is obviously also some amount (maybe a lot) of core knowledge and capability needed even to be able to ask the right questions and utilize the answers.

nkmnz|3 months ago

What if you give them 13 years?

homarp|3 months ago

but in 13 years, will they be capable?

andai|3 months ago

Actually it depends on the task. For many tasks, a smaller model can handle it, and it gets there faster!