Gemini 3 Flash: Frontier intelligence built for speed
1102 points| meetpateltech | 2 months ago |blog.google | reply
Developer Blog: https://blog.google/technology/developers/build-with-gemini-...
Model Card [pdf]: https://deepmind.google/models/model-cards/gemini-3-flash/
Gemini 3 Flash in Search AI mode: https://blog.google/products/search/google-ai-mode-update-ge...
Deepmind Page: https://deepmind.google/models/gemini/flash/
[+] [-] samyok|2 months ago|reply
I have been playing with it for the past few weeks, it’s genuinely my new favorite; it’s so fast and it has such a vast world knowledge that it’s more performant than Claude Opus 4.5 or GPT 5.2 extra high, for a fraction (basically order of magnitude less!!) of the inference time and price
[+] [-] thecupisblue|2 months ago|reply
After reading your comment I ran my product benchmark against 2.5 flash, 2.5 pro and 3.0 flash.
The results are better AND the response times have stayed the same. What an insane gain - especially considering the price compared to 2.5 Pro. I'm about to get much better results for 1/3rd of the price. Not sure what magic Google did here, but would love to hear a more technical deep dive comparing what they do different in Pro and Flash models to achieve such a performance.
Also wondering, how did you get early access? I'm using the Gemini API quite a lot and have a quite nice internal benchmark suite for it, so would love to toy with the new ones as they come out.
[+] [-] lambda|2 months ago|reply
I periodically ask them questions about topics that are subtle or tricky, and somewhat niche, that I know a lot about, and find that they frequently provide extremely bad answers. There have been improvements on some topics, but there's one benchmark question that I have that just about every model I've tried has completely gotten wrong.
Tried it on LMArena recently, got a comparison between Gemini 2.5 flash and a codenamed model that people believe was a preview of Gemini 3 flash. Gemini 2.5 flash got it completely wrong. Gemini 3 flash actually gave a reasonable answer; not quite up to the best human description, but it's the first model I've found that actually seems to mostly correctly answer the question.
So, it's just one data point, but at least for my one fairly niche benchmark problem, Gemini 3 Flash has successfully answered a question that none of the others I've tried have (I haven't actually tried Gemini 3 Pro, but I'd compared various Claude and ChatGPT models, and a few different open weights models).
So, guess I need to put together some more benchmark problems, to get a better sample than one, but it's at least now passing a "I can find the answer to this in the top 3 hits in a Google search for a niche topic" test better than any of the other models.
Still a lot of things I'm skeptical about in all the LLM hype, but at least they are making some progress in being able to accurately answer a wider range of questions.
[+] [-] mips_avatar|2 months ago|reply
[+] [-] kartayyar|2 months ago|reply
https://github.com/Roblox/open-game-eval/blob/main/LLM_LEADE...
[+] [-] scrollop|2 months ago|reply
https://artificialanalysis.ai/evaluations/omniscience
[+] [-] giancarlostoro|2 months ago|reply
[+] [-] mmaunder|2 months ago|reply
[+] [-] yunohn|2 months ago|reply
[+] [-] behnamoh|2 months ago|reply
I think it's bad naming on google's part. "flash" implies low quality, fast but not good enough. I get less negative feeling looking at "mini" models.
[+] [-] jauntywundrkind|2 months ago|reply
I've been playing around with other models recently (Kimi, GPT Codex, Qwen, others) to try to better appreciate the difference. I knew there was a big price difference, but watching myself feeding dollars into the machine rather than nickles has also founded in me quite the reverse appreciation too.
I only assume "if you're not getting charged, you are the product" has to be somewhat in play here. But when working on open source code, I don't mind.
[+] [-] esafak|2 months ago|reply
[+] [-] tonyhart7|2 months ago|reply
claude is coding model from the start but GPT is in more and more becoming coding model
[+] [-] epolanski|2 months ago|reply
[+] [-] kqr|2 months ago|reply
[1]: https://entropicthoughts.com/haiku-4-5-playing-text-adventur...
[+] [-] freedomben|2 months ago|reply
[+] [-] unsupp0rted|2 months ago|reply
[+] [-] pplonski86|2 months ago|reply
[+] [-] __jl__|2 months ago|reply
They are pushing the prices higher with each release though: API pricing is up to $0.5/M for input and $3/M for output
For comparison:
Gemini 3.0 Flash: $0.50/M for input and $3.00/M for output
Gemini 2.5 Flash: $0.30/M for input and $2.50/M for output
Gemini 2.0 Flash: $0.15/M for input and $0.60/M for output
Gemini 1.5 Flash: $0.075/M for input and $0.30/M for output (after price drop)
Gemini 3.0 Pro: $2.00/M for input and $12/M for output
Gemini 2.5 Pro: $1.25/M for input and $10/M for output
Gemini 1.5 Pro: $1.25/M for input and $5/M for output
I think image input pricing went up even more.
Correction: It is a preview model...
[+] [-] RobinL|2 months ago|reply
Presumably a big motivation for them is to be first to get something good and cheap enough they can serve to every Android device, ahead of whatever the OpenAI/Jony Ive hardware project will be, and way ahead of Apple Intelligence. Speaking for myself, I would pay quite a lot for truly 'AI first' phone that actually worked.
[+] [-] exegete|2 months ago|reply
[+] [-] skerit|2 months ago|reply
[+] [-] mark_l_watson|2 months ago|reply
I almost switched out of the Apple ecosystem a few months ago, but I have an Apple Studio monitor and using it with non-Apple gear is problematic. Otherwise a Pixel phone and a Linux box with a commodity GPU would do it for me.
[+] [-] anukin|2 months ago|reply
[+] [-] fariszr|2 months ago|reply
Is there an OSS model that's better than 2.0 flash with similar pricing, speed and a 1m context window?
Edit: this is not the typical flash model, it's actually an insane value if the benchmarks match real world usage.
> Gemini 3 Flash achieves a score of 78%, outperforming not only the 2.5 series, but also Gemini 3 Pro. It strikes an ideal balance for agentic coding, production-ready systems and responsive interactive applications.
The replacement for old flash models will be probably the 3.0 flash lite then.
[+] [-] thecupisblue|2 months ago|reply
So if 2.5 Pro was good for your usecase, you just got a better model for about 1/3rd of the price, but might hurt the wallet a bit more if you use 2.5 Flash currently and want an upgrade - which is fair tbh.
[+] [-] aoeusnth1|2 months ago|reply
[+] [-] sosodev|2 months ago|reply
It's extremely fast on good hardware, quite smart, and can support up to 1m context with reasonable accuracy
[+] [-] mips_avatar|2 months ago|reply
[+] [-] scrollop|2 months ago|reply
https://epoch.ai/benchmarks/simplebench
[+] [-] fullstackwife|2 months ago|reply
[+] [-] qnleigh|2 months ago|reply
[+] [-] ipsum2|2 months ago|reply
[+] [-] Palmik|2 months ago|reply
[+] [-] Simon321|2 months ago|reply
[+] [-] Workaccount2|2 months ago|reply
Gemini 3 pro got 20%, and everyone else has gotten 0%. I saw benchmarks showing 3 flash is almost trading blows with 3 pro, so I decided to try it.
Basically it is an image showing a dog with 5 legs, an extra one photoshopped onto it's torso. Every models counts 4, and gemini 3 pro, while also counting 4, said the dog had a "large male anatomy". However it failed a follow-up saying 4 again.
3 flash counted 5 legs on the same image, however I added distinct a "tattoo" to each leg as an assist. These tattoos didn't help 3 pro or other models.
So it is the first out of all the models I have tested to count 5 legs on the "tattooed legs" image. It still counted only 4 legs on the image without the tattoos. I'll give it 1/2 credit.
[+] [-] simonsarris|2 months ago|reply
With this release the "good enough" and "cheap enough" intersect so hard that I wonder if this is an existential threat to those other companies.
[+] [-] mmaunder|2 months ago|reply
Now, imagine for a moment they had also vertically integrated the hardware to do this.
[+] [-] kingstnap|2 months ago|reply
I'm speculating but Google might have figured out some training magic trick to balance out the information storage in model capacity. That or this flash model has huge number of parameters or something.
[+] [-] simonw|2 months ago|reply
It's 1/4 the price of Gemini 3 Pro ≤200k and 1/8 the price of Gemini 3 Pro >200k - notable that the new Flash model doesn’t have a price increase after that 200,000 token point.
It’s also twice the price of GPT-5 Mini for input, half the price of Claude 4.5 Haiku.
[+] [-] caminanteblanco|2 months ago|reply
I assume that these are just different reasoning levels for Gemini 3, but I can't even find mention of there being 2 versions anywhere, and the API doesn't even mention the Thinking-Pro dichotomy.
[+] [-] xpil|2 months ago|reply
[+] [-] outside2344|2 months ago|reply
[+] [-] SyrupThinker|2 months ago|reply
Just avoiding/fixing that would probably speed up a good chunk of my own queries.
[+] [-] zhyder|2 months ago|reply
[+] [-] primaprashant|2 months ago|reply
For comparison, from 2.5 Pro ($1.25 / $10) to 3 Pro ($2 / $12), there was 60% increase in input tokens and 20% increase in output tokens pricing.
[+] [-] meetpateltech|2 months ago|reply
Developer Blog: https://blog.google/technology/developers/build-with-gemini-...
Model Card [pdf]: https://deepmind.google/models/model-cards/gemini-3-flash/
Gemini 3 Flash in Search AI mode: https://blog.google/products/search/google-ai-mode-update-ge...
[+] [-] zurfer|2 months ago|reply
[+] [-] rohitpaulk|2 months ago|reply
[+] [-] hubraumhugo|2 months ago|reply