Very small: can run on the edge to allow something like a Raspberry Pi to make basic decisions for your appliance even if disconnected from the internet. Example: those are some time series parameters and instructions, decide if watering the plants or not; vision models that can watch a camera and transcribe what it is seeing in a basic way, ...
Small: runs in an average laptop not optimized for inference of LLMs, like Gemma 3 4B.
Medium: runs in a very high spec computer that people can buy for less than 5k. 30B, 70B dense models or larger MoEs.
Large: Models that big LLM providers sell as "mini", "flash", ...
Extra Large / SOTA: Gemini 2.5 PRO, Claude 4 Opus, ChatGPT O3, ...
I'm not sure if you're implying that very small language models would be run in your raspberry pi example, but for use cases like the time series one, wouldn't something like an LSTM or TiDE architecture make more sense than a language model?
These are typically small and performant both in compute and accuracy/utility from what I've seen.
I think with all the hype at the moment sometimes AI/ML has become too synonymous with LLM
> Example: those are some time series parameters and instructions, decide if watering the plants or not; vision models that can watch a camera and transcribe what it is seeing in a basic way, ...
This is the problem I have with the general discourse of "AI" even on Hacker News, of all places. Everything you listed is not an example of a *language model*.
All of those can either be implemented as a simple "if", decision tree, decision table, and finally actual ML in the example of cameras and time series predication.
Using an LLM is not just ridiculous here but totally the wrong fit and a waste of resources.
I'll add one more: a LLM small enough that it can be trained from scratch on one A100 in 24 hours. Is it really small if it takes $10,000 to train? Or leave that term for $200 models?
Back to your definitions, there are sub-1B models people are using. I think I saw one in the 400-600M range for audio. Another person posted here a 100M-200M model for extracting data from web pages. We told them to just use a rules-based approach where possible but they believed the SLM worked better.
Then, there's projects like BabyLM that can be useful at 10M:
I think of “fits on the overpowered M1/2/3/4 64GB MacBook Pro my employer gave me” as the dividing line. We’re getting to within spitting distance of models that can code well at that size.
I want my next laptop to be the 128gb M series monster. That will run not quite frontier models but ones that are close in performance, and run them fast.
There is a "small language model", and then there is a "small LARGE language model". In late 2018, BERT (110 million params) would've been considered a "large" language model. A "small" LM would be some markov chain or a topic model (e.g. latent dirichlet allocation) - technically they would be considered generative language models since they learn joint distributions of params and data (words), and can then sample from that distribution. But today, we usually map "small" LMs to "small" LLMs, so in that sense a small LLM would be anything from BERT to around 3-4B params.
Maybe we should appropriate the old DOS/x86 memory model names and give them “class-relative” sizes.
“tiny” can run on a microcontroller, “compact” on a Rpi, “small” on a phone, “medium” on a single GPU machine, “large” on AI class workstation hardware, and “huge” on a data center cluster.
Why wouldn’t there be any? Right now there are large large language models, medium large language models and small large language models. You can say there are also tiny large language models and extra large large language models. Nothing confusing about it.
See also the Little Giant Girl who is part of The Sultan's Elephant and several other Royal de Luxe performances. She's clearly a little girl, but, she's also clearly a giant.
After experimenting with 1B models, I am starting to think that any model with 1B parameters or less will probably lack a lot of the general intelligence that we observe in the frontier models, because it seems physically impossible to encode that much information into so few parameters. I believe that in the range of very small models, the winner will be models that are fine tuned to a small range of tasks or domains, such as a model that can translate between English and any other language, or a legal summarization model, etc.
On this topic, I've been wondering if models are capable of recommending other models for a given machine spec, for example: which model, if any, would be recommended for a laptop with a Ryzen 9 6000S and RTX 3060m (random spec).
(To share a recent personal experience about Markov models: I bootstrapped recently a HMM with hand-assigned weights. It was around 15x15 class transitions, 225 weights. That's small. Or rather, microscopic. Then I ran it against real data, and picked up examples of wrong classifications, and made them auxillary training data. Of course, it was not a language model, language model is impossible to fit in such a small space. It was a model of transitions of chapter "types" in novels, where types are something like "Epilogue" , "Prologue", "Chapter 23", "Table of Contents", "Afterword" etc.)
antirez|9 months ago
Small: runs in an average laptop not optimized for inference of LLMs, like Gemma 3 4B.
Medium: runs in a very high spec computer that people can buy for less than 5k. 30B, 70B dense models or larger MoEs.
Large: Models that big LLM providers sell as "mini", "flash", ...
Extra Large / SOTA: Gemini 2.5 PRO, Claude 4 Opus, ChatGPT O3, ...
mnahkies|9 months ago
These are typically small and performant both in compute and accuracy/utility from what I've seen.
I think with all the hype at the moment sometimes AI/ML has become too synonymous with LLM
mnky9800n|9 months ago
layer8|9 months ago
SkiFire13|9 months ago
How is that a "language model"?
lloydatkinson|9 months ago
This is the problem I have with the general discourse of "AI" even on Hacker News, of all places. Everything you listed is not an example of a *language model*.
All of those can either be implemented as a simple "if", decision tree, decision table, and finally actual ML in the example of cameras and time series predication.
Using an LLM is not just ridiculous here but totally the wrong fit and a waste of resources.
oezi|9 months ago
nickpsecurity|9 months ago
I'll add one more: a LLM small enough that it can be trained from scratch on one A100 in 24 hours. Is it really small if it takes $10,000 to train? Or leave that term for $200 models?
Back to your definitions, there are sub-1B models people are using. I think I saw one in the 400-600M range for audio. Another person posted here a 100M-200M model for extracting data from web pages. We told them to just use a rules-based approach where possible but they believed the SLM worked better.
Then, there's projects like BabyLM that can be useful at 10M:
https://babylm.github.io/
GardenLetter27|9 months ago
Maybe resources needed for fine-tuning would be nice to see.
unknown|9 months ago
[deleted]
monkeyisland|9 months ago
Could you post a link to this comment or thread. I can't seem to find this model by searching but world love to try it out.
zellyn|9 months ago
Maxious|9 months ago
api|9 months ago
armcat|9 months ago
breckinloggins|9 months ago
“tiny” can run on a microcontroller, “compact” on a Rpi, “small” on a phone, “medium” on a single GPU machine, “large” on AI class workstation hardware, and “huge” on a data center cluster.
alexpham14|9 months ago
lblume|9 months ago
croes|9 months ago
kelseyfrog|9 months ago
baq|9 months ago
tialaramex|9 months ago
srikz|9 months ago
firejake308|9 months ago
vindex10|9 months ago
https://huggingface.co/docs/transformers.js/en/index
unknown|9 months ago
[deleted]
relaxing|9 months ago
mcswell|9 months ago
Does this mean without a dedicated electric power plant?
I wanted to say "Right, big-sized. Do you want fries with that?", but I couldn't figure out how to work that in, so I won't say it.
rickstanley|9 months ago
Dwedit|9 months ago
GolDDranks|9 months ago
GolDDranks|9 months ago
Havoc|9 months ago
stephantul|9 months ago
gwern|9 months ago
100%. It has enough technical details that maybe a human did something. But who knows.
maksimur|9 months ago
KasianFranks|9 months ago
option|9 months ago
Velorivox|9 months ago
[deleted]
MiddleEndian|9 months ago