(no title)
alekandreev | 11 months ago
(Opinions our own and not of Google DeepMind.)
PS we are hiring: https://boards.greenhouse.io/deepmind/jobs/6590957
alekandreev | 11 months ago
(Opinions our own and not of Google DeepMind.)
PS we are hiring: https://boards.greenhouse.io/deepmind/jobs/6590957
heinrichf|11 months ago
- Gemma3 12B: ~100 t/s on prompt eval; 15 t/s on eval
- MistralSmall3 24B: ~500 t/s on prompt eval; 10 t/s on eval
Do you know what different in architecture could make the prompt eval (prefill) so much slower on the 2x smaller Gemma3 model?
alekandreev|11 months ago
remuskaos|11 months ago
When I set the context size to 2048 (openwebui's default), the inference is almost twice as fast as when I set it to 4096. I can't set the conext size any higher because my GPU only has 12GB of RAM and ollama crashes for larger context sizes.
Still, I find that thoroughly odd. Using the larger conetext size (4096), the GPU usage is only 50% as seen in nvtop. I have no idea why.
magicalhippo|11 months ago
I have some dumb questions though, might as well ask. How do you decide on the model sizes? And how do you train them? Independently or are they related somehow?
alekandreev|11 months ago
The models are trained with distillation from a bigger teacher. We train them independently, but for v3 we have unified the recipes for 4B-27B, to give you more predictably when scaling up and down to different model sizes.
miki123211|11 months ago
We will run our internal evals on it for sure, but just wanted to ask whether that's even a use case that the team considered and trained for.
canyon289|11 months ago
We do care about prompted instructions, like json schema, and it is something we eval for and encourage you to try. Here's an example from Gemma2 to guide folks looking to do what it sounds like you're interested in.
https://www.youtube.com/watch?v=YxhzozLH1Dk
Multilinguality was a big focus in Gemma3. Give it a try
And for structured output Gemma works well with many structured output libraries, for example the one built into Ollama
https://github.com/ollama/ollama/blob/main/docs/api.md#struc...
In short you should have all the functionality you need!
seektable|11 months ago
Ollama error: POST predict: Post "http://127.0.0.1:49675/completion": read tcp 127.0.0.1:49677->127.0.0.1:49675: wsarecv: An existing connection was forcibly closed by the remote host.
Not sure this is Ollama or gemma3:4b problem. At the same time, gemma3:12b works fine for the same API request (100% identical, only difference is model id).
unknown|11 months ago
[deleted]
swyx|11 months ago
alekandreev|11 months ago
mdp2021|11 months ago
Question: your model supports 140 languages. Given that you are focusing on compactness and efficiency, would you not have gains in also developing models on a selected limited number of languages (e.g. the topmost (in cultural production) four "western" ones with shared alphabet - or similar set)?
Edit: of course the multilingual capability can be can be welcome. On the other hand, there are evident cases in which efficiency can be paramount. We can wonder about the tradeoff: how much in efficiency is sacrificed by features.
alekandreev|11 months ago
[1] https://huggingface.co/aiplanet/buddhi-indic
[2] https://ai.google.dev/gemma/gemmaverse/sealion
sidkshatriya|11 months ago
Q. When you are training with a context length of 128k, is the attention in the global layers dense or sparse ?
If dense, would the attention memory requirement here would be O(n^2) where n is 128k for each global layer ?
alekandreev|11 months ago
We wanted the long context recipe to be friendly for finetuning, and training at 128k is a bit of a pain we don't do it. For inference, we see inference at 128k with the 5/1 is close to RAM usage for a fully-global-layer model at 32k.
Individual attention layers are always dense.
moffkalast|11 months ago
alekandreev|11 months ago
werediver|11 months ago
LM Studio doesn't allow that (yet), but maybe the s/w requires some adjustments to support speculative decoding with Gemma 3.
pinglin|11 months ago
Herring|11 months ago
saagarjha|11 months ago
nothrowaways|11 months ago