top | item 46301851

Gemini 3 Flash: Frontier intelligence built for speed

1102 points| meetpateltech | 2 months ago |blog.google | reply

Docs: https://ai.google.dev/gemini-api/docs/gemini-3

Developer Blog: https://blog.google/technology/developers/build-with-gemini-...

Model Card [pdf]: https://deepmind.google/models/model-cards/gemini-3-flash/

Gemini 3 Flash in Search AI mode: https://blog.google/products/search/google-ai-mode-update-ge...

Deepmind Page: https://deepmind.google/models/gemini/flash/

580 comments

order
[+] samyok|2 months ago|reply
Don’t let the “flash” name fool you, this is an amazing model.

I have been playing with it for the past few weeks, it’s genuinely my new favorite; it’s so fast and it has such a vast world knowledge that it’s more performant than Claude Opus 4.5 or GPT 5.2 extra high, for a fraction (basically order of magnitude less!!) of the inference time and price

[+] thecupisblue|2 months ago|reply
Oh wow - I recently tried 3 Pro preview and it was too slow for me.

After reading your comment I ran my product benchmark against 2.5 flash, 2.5 pro and 3.0 flash.

The results are better AND the response times have stayed the same. What an insane gain - especially considering the price compared to 2.5 Pro. I'm about to get much better results for 1/3rd of the price. Not sure what magic Google did here, but would love to hear a more technical deep dive comparing what they do different in Pro and Flash models to achieve such a performance.

Also wondering, how did you get early access? I'm using the Gemini API quite a lot and have a quite nice internal benchmark suite for it, so would love to toy with the new ones as they come out.

[+] lambda|2 months ago|reply
I'm a significant genAI skeptic.

I periodically ask them questions about topics that are subtle or tricky, and somewhat niche, that I know a lot about, and find that they frequently provide extremely bad answers. There have been improvements on some topics, but there's one benchmark question that I have that just about every model I've tried has completely gotten wrong.

Tried it on LMArena recently, got a comparison between Gemini 2.5 flash and a codenamed model that people believe was a preview of Gemini 3 flash. Gemini 2.5 flash got it completely wrong. Gemini 3 flash actually gave a reasonable answer; not quite up to the best human description, but it's the first model I've found that actually seems to mostly correctly answer the question.

So, it's just one data point, but at least for my one fairly niche benchmark problem, Gemini 3 Flash has successfully answered a question that none of the others I've tried have (I haven't actually tried Gemini 3 Pro, but I'd compared various Claude and ChatGPT models, and a few different open weights models).

So, guess I need to put together some more benchmark problems, to get a better sample than one, but it's at least now passing a "I can find the answer to this in the top 3 hits in a Google search for a niche topic" test better than any of the other models.

Still a lot of things I'm skeptical about in all the LLM hype, but at least they are making some progress in being able to accurately answer a wider range of questions.

[+] mips_avatar|2 months ago|reply
OpenAI made a huge mistake neglecting fast inferencing models. Their strategy was gpt 5 for everything, which hasn't worked out at all. I'm really not sure what model OpenAI wants me to use for my applications that require lower latency. If I follow their advice in their API docs about which models I should use for faster responses I get told either use GPT 5 low thinking, or replace gpt 5 with gpt 4.1, or switch to the mini model. Now as a developer I'm doing evals on all three of these combinations. I'm running my evals on gemini 3 flash right now, and it's outperforming gpt5 thinking without thinking. OpenAI should stop trying to come up with ads and make models that are useful.
[+] scrollop|2 months ago|reply
Alright so we have more benchmarks including hallucinations and flash doesn't do well with that, though generally it beats gemini 3 pro and GPT 5.1 thinking and gpt 5.2 thinking xhigh (but then, sonnet, grok, opus, gemini and 5.1 beat 5.2 xhigh) - everything. Crazy.

https://artificialanalysis.ai/evaluations/omniscience

[+] giancarlostoro|2 months ago|reply
I wonder at what point will everyone who over-invested in OpenAI will regret their decision (expect maybe Nvidia?). Maybe Microsoft doesn't need to care, they get to sell their models via Azure.
[+] mmaunder|2 months ago|reply
Thanks, having it walk a hardcore SDR signal chain right now --- oh damn it just finished. The blog post makes it clear this isn't just some 'lite' model - you get low latency and cognitive performance. really appreciate you amplifying that.
[+] yunohn|2 months ago|reply
I love how every single LLM model release is accompanied by pre-release insiders proclaiming how it’s the best model yet…
[+] behnamoh|2 months ago|reply
> Don’t let the “flash” name fool you

I think it's bad naming on google's part. "flash" implies low quality, fast but not good enough. I get less negative feeling looking at "mini" models.

[+] jauntywundrkind|2 months ago|reply
Just to point this out: many of these frontier models cost isn't that far away from two orders of magnitude more than what DeepSeek charges. It doesn't compare the same, no, but with coaxing I find it to be a pretty capable competent coding model & capable of answering a lot of general queries pretty satisfactorily (but if it's a short session, why economize?). $0.28/m in, $0.42/m out. Opus 4.5 is $5/$25 (17x/60x).

I've been playing around with other models recently (Kimi, GPT Codex, Qwen, others) to try to better appreciate the difference. I knew there was a big price difference, but watching myself feeding dollars into the machine rather than nickles has also founded in me quite the reverse appreciation too.

I only assume "if you're not getting charged, you are the product" has to be somewhat in play here. But when working on open source code, I don't mind.

[+] esafak|2 months ago|reply
What are you using it for and what were you using before?
[+] tonyhart7|2 months ago|reply
I think google is the only one that still produce general knowledge LLM right now

claude is coding model from the start but GPT is in more and more becoming coding model

[+] epolanski|2 months ago|reply
Gemini 2.0 flash was good already for some tasks of mine long time ago..
[+] freedomben|2 months ago|reply
Cool! I've been using 2.5 flash and it is pretty bad. 1 out of 5 answers it gives will be a lie. Hopefully 3 is better
[+] unsupp0rted|2 months ago|reply
How good is it for coding, relative to recent frontier models like GPT 5.x, Sonnet 4.x, etc?
[+] pplonski86|2 months ago|reply
Lately I was trying ask LLMs to generate SVG pictures, do you have famous pelican on bike created by flash model?
[+] __jl__|2 months ago|reply
This is awesome. No preview release either, which is great to production.

They are pushing the prices higher with each release though: API pricing is up to $0.5/M for input and $3/M for output

For comparison:

Gemini 3.0 Flash: $0.50/M for input and $3.00/M for output

Gemini 2.5 Flash: $0.30/M for input and $2.50/M for output

Gemini 2.0 Flash: $0.15/M for input and $0.60/M for output

Gemini 1.5 Flash: $0.075/M for input and $0.30/M for output (after price drop)

Gemini 3.0 Pro: $2.00/M for input and $12/M for output

Gemini 2.5 Pro: $1.25/M for input and $10/M for output

Gemini 1.5 Pro: $1.25/M for input and $5/M for output

I think image input pricing went up even more.

Correction: It is a preview model...

[+] RobinL|2 months ago|reply
Feels like Google is really pulling ahead of the pack here. A model that is cheap, fast and good, combined with Android and gsuite integration seems like such powerful combination.

Presumably a big motivation for them is to be first to get something good and cheap enough they can serve to every Android device, ahead of whatever the OpenAI/Jony Ive hardware project will be, and way ahead of Apple Intelligence. Speaking for myself, I would pay quite a lot for truly 'AI first' phone that actually worked.

[+] skerit|2 months ago|reply
Pulling ahead? Depends on the usecase I guess. 3 turns into a very basic Gemini-CLI session and Gemini 3 Pro has already messed up a simple `Edit` tool-call. And it's awfully slow. In 27 minutes it did 17 tool calls, and only managed to modify 2 files. Meanwhile Claude-Code flies through the same task in 5 minutes.
[+] mark_l_watson|2 months ago|reply
My non-tech brother has the latest Google Pixel phone and he enthusiastically uses Gemini for many interactions with his phone.

I almost switched out of the Apple ecosystem a few months ago, but I have an Apple Studio monitor and using it with non-Apple gear is problematic. Otherwise a Pixel phone and a Linux box with a commodity GPU would do it for me.

[+] anukin|2 months ago|reply
What will you use the ai in the phone to do for you? I can understand tablets and smart glasses being able to leverage smol AI much better than a phone which is reliant on apps for most of the work.
[+] fariszr|2 months ago|reply
These flash models keep getting more expensive with every release.

Is there an OSS model that's better than 2.0 flash with similar pricing, speed and a 1m context window?

Edit: this is not the typical flash model, it's actually an insane value if the benchmarks match real world usage.

> Gemini 3 Flash achieves a score of 78%, outperforming not only the 2.5 series, but also Gemini 3 Pro. It strikes an ideal balance for agentic coding, production-ready systems and responsive interactive applications.

The replacement for old flash models will be probably the 3.0 flash lite then.

[+] thecupisblue|2 months ago|reply
Yes, but the 3.0 Flash is cheaper, faster and better than 2.5 Pro.

So if 2.5 Pro was good for your usecase, you just got a better model for about 1/3rd of the price, but might hurt the wallet a bit more if you use 2.5 Flash currently and want an upgrade - which is fair tbh.

[+] aoeusnth1|2 months ago|reply
I think it's good, they're raising the size (and price) of flash a bit and trying to position Flash as an actually useful coding / reasoning model. There's always lite for people who want dirt cheap prices and don't care about quality at all.
[+] mips_avatar|2 months ago|reply
For my apps evals Gemini flash and grok 4 fast are the only ones worth using. I'd love for an open weights model to compete in this arena but I haven't found one.
[+] fullstackwife|2 months ago|reply
cost of e2e task resolution should be cheaper, even if single inference cost is higher, you need fewer loops to solve a problem now
[+] qnleigh|2 months ago|reply
This model is breaking records on my benchmark of choice, which is 'the fraction of Hacker News comments that are positive.' Even people who avoid Google products on principle are impressed. Hardly anyone is arguing that ChatGPT is better in any respect (except brand recognition).
[+] ipsum2|2 months ago|reply
Chatgpt 5.2 thinking is significantly better quality for most knowledge work, but it trades off in speed.
[+] Palmik|2 months ago|reply
No offense, but that seems like a poor benchmark. These initial vibe checks are easily swayed by personal brand biases.
[+] Simon321|2 months ago|reply
i don't know, chat gpt seems to hallucinate a lot less
[+] Workaccount2|2 months ago|reply
So gemini 3 flash (non thinking) is now the first model to get 50% on my "count the dog legs" image test.

Gemini 3 pro got 20%, and everyone else has gotten 0%. I saw benchmarks showing 3 flash is almost trading blows with 3 pro, so I decided to try it.

Basically it is an image showing a dog with 5 legs, an extra one photoshopped onto it's torso. Every models counts 4, and gemini 3 pro, while also counting 4, said the dog had a "large male anatomy". However it failed a follow-up saying 4 again.

3 flash counted 5 legs on the same image, however I added distinct a "tattoo" to each leg as an assist. These tattoos didn't help 3 pro or other models.

So it is the first out of all the models I have tested to count 5 legs on the "tattooed legs" image. It still counted only 4 legs on the image without the tattoos. I'll give it 1/2 credit.

[+] simonsarris|2 months ago|reply
Even before this release the tools (for me: Claude Code and Gemini for other stuff) reached a "good enough" plateau that means any other company is going to have a hard time making me (I think soon most users) want to switch. Unless a new release from a different company has a real paradigm shift, they're simply sufficient. This was not true in 2023/2024 IMO.

With this release the "good enough" and "cheap enough" intersect so hard that I wonder if this is an existential threat to those other companies.

[+] mmaunder|2 months ago|reply
I think about what would be most terrifying to Anthropic and OpenAI i.e. The absolute scariest thing that Google could do. I think this is it: Release low latency, low priced models with high cognitive performance and big context window, especially in the coding space because that is direct, immediate, very high ROI for the customer.

Now, imagine for a moment they had also vertically integrated the hardware to do this.

[+] kingstnap|2 months ago|reply
It has a SimpleQA score of 69%, a benchmark that tests knowledge on extremely niche facts, that's actually ridiculously high (Gemini 2.5 *Pro* had 55%) and reflects either training on the test set or some sort of cracked way to pack a ton of parametric knowledge into a Flash Model.

I'm speculating but Google might have figured out some training magic trick to balance out the information storage in model capacity. That or this flash model has huge number of parameters or something.

[+] simonw|2 months ago|reply
Quick pricing comparison: https://www.llm-prices.com/#it=100000&ot=10000&sel=gemini-3-...

It's 1/4 the price of Gemini 3 Pro ≤200k and 1/8 the price of Gemini 3 Pro >200k - notable that the new Flash model doesn’t have a price increase after that 200,000 token point.

It’s also twice the price of GPT-5 Mini for input, half the price of Claude 4.5 Haiku.

[+] caminanteblanco|2 months ago|reply
Does anyone else understand what the difference is between Gemini 3 'Thinking' and 'Pro'? Thinking "Solves complex problems" and Pro "Thinks longer for advanced math & code".

I assume that these are just different reasoning levels for Gemini 3, but I can't even find mention of there being 2 versions anywhere, and the API doesn't even mention the Thinking-Pro dichotomy.

[+] xpil|2 months ago|reply
My main issue with Gemini is that business accounts can't delete individual conversations. You can only enable or disable Gemini, or set a retention period (3 months minimum), but there's no way to delete specific chats. I'm a paying customer, prices keep going up, and yet this very basic feature is still missing.
[+] outside2344|2 months ago|reply
I don't want to say OpenAI is toast for general chat AI, but it sure looks like they are toast.
[+] SyrupThinker|2 months ago|reply
I wonder if this suffers from the same issue as 3 Pro, that it frequently "thinks" for a long time about date incongruity, insisting that it is 2024, and that information it receives must be incorrect or hypothetical.

Just avoiding/fixing that would probably speed up a good chunk of my own queries.

[+] zhyder|2 months ago|reply
Glad to see big improvement in the SimpleQA Verified benchmark (28->69%), which is meant to measure factuality (built-in, i.e. without adding grounding resources). That's one benchmark where all models seemed to have low scores until recently. Can't wait to see a model go over 90%... then will be years till the competition is over number of 9s in such a factuality benchmark, but that'd be glorious.
[+] primaprashant|2 months ago|reply
Pricing is $0.5 / $3 per million input / output tokens. 2.5 Flash was $0.3 / $2.5. That's 66% increase in input tokens and 20% increase in output token pricing.

For comparison, from 2.5 Pro ($1.25 / $10) to 3 Pro ($2 / $12), there was 60% increase in input tokens and 20% increase in output tokens pricing.

[+] zurfer|2 months ago|reply
It's a cool release, but if someone on the google team reads that: flash 2.5 is awesome in terms of latency and total response time without reasoning. In quick tests this model seems to be 2x slower. So for certain use cases like quick one-token classification flash 2.5 is still the better model. Please don't stop optimizing for that!
[+] rohitpaulk|2 months ago|reply
Wild how this beats 2.5 Pro in every single benchmark. Don't think this was true for Haiku 4.5 vs Sonnet 3.5.