(no title)
samyok | 2 months ago
I have been playing with it for the past few weeks, it’s genuinely my new favorite; it’s so fast and it has such a vast world knowledge that it’s more performant than Claude Opus 4.5 or GPT 5.2 extra high, for a fraction (basically order of magnitude less!!) of the inference time and price
thecupisblue|2 months ago
After reading your comment I ran my product benchmark against 2.5 flash, 2.5 pro and 3.0 flash.
The results are better AND the response times have stayed the same. What an insane gain - especially considering the price compared to 2.5 Pro. I'm about to get much better results for 1/3rd of the price. Not sure what magic Google did here, but would love to hear a more technical deep dive comparing what they do different in Pro and Flash models to achieve such a performance.
Also wondering, how did you get early access? I'm using the Gemini API quite a lot and have a quite nice internal benchmark suite for it, so would love to toy with the new ones as they come out.
lancekey|2 months ago
Examples from the wild are a great learning tool, anything you’re able to share is appreciated.
m00dy|2 months ago
[0] https://deepwalker.xyz
lambda|2 months ago
I periodically ask them questions about topics that are subtle or tricky, and somewhat niche, that I know a lot about, and find that they frequently provide extremely bad answers. There have been improvements on some topics, but there's one benchmark question that I have that just about every model I've tried has completely gotten wrong.
Tried it on LMArena recently, got a comparison between Gemini 2.5 flash and a codenamed model that people believe was a preview of Gemini 3 flash. Gemini 2.5 flash got it completely wrong. Gemini 3 flash actually gave a reasonable answer; not quite up to the best human description, but it's the first model I've found that actually seems to mostly correctly answer the question.
So, it's just one data point, but at least for my one fairly niche benchmark problem, Gemini 3 Flash has successfully answered a question that none of the others I've tried have (I haven't actually tried Gemini 3 Pro, but I'd compared various Claude and ChatGPT models, and a few different open weights models).
So, guess I need to put together some more benchmark problems, to get a better sample than one, but it's at least now passing a "I can find the answer to this in the top 3 hits in a Google search for a niche topic" test better than any of the other models.
Still a lot of things I'm skeptical about in all the LLM hype, but at least they are making some progress in being able to accurately answer a wider range of questions.
prettyblocks|2 months ago
andai|2 months ago
Which also implies that (for most tasks), most of the weights in a LLM are unnecessary, since they are spent on memorizing the long tail of Common Crawl... but maybe memorizing infinite trivia is not a bug but actually required for the generalization to work? (Humans don't have far transfer though... do transformers have it?)
jve|2 months ago
Today I had to resolve performance problems for some sql server statement. Been doing it years, know the regular pitfalls, sometimes have to find "right" words to explain to customer why X is bad and such.
I described the issue to GPT5.2, gave the query, the execution plan and asked for help.
It was spot on, high quality responses and actionable items and explanations on why this or that is bad, how to improve it and why particularly sql may have generated such a query plan. I could instantly validate the response given my experience in the field. I even answered with some parts of chatgpt on how well it explained. However I did mention that to customer and I did tell them I approve the answer.
Asked high quality question and receive a high quality answer. And I am happy that I found out about an sql server flag where I can influence particular decision. But the suggestion was not limited to that, there were multiple points given that would help.
fragmede|2 months ago
TeodorDyakov|2 months ago
arisAlexis|2 months ago
vitaflo|2 months ago
mips_avatar|2 months ago
danpalmer|2 months ago
The only non-TPU fast models I'm aware of are things running on Cerebras can be much faster because of their CPUs, and Grok has a super fast mode, but they have a cheat code of ignoring guardrails and making up their own world knowledge.
andai|2 months ago
simonw|2 months ago
windexh8er|2 months ago
And now with RAM, GPU and boards being a PitA to get based on supply and pricing - double middle finger to all the big tech this holiday season!
behnamoh|2 months ago
It's a lost battle. It'll always be cheaper to use an open source model hosted by others like together/fireworks/deepinfra/etc.
I've been maining Mistral lately for low latency stuff and the price-quality is hard to beat.
campers|2 months ago
They do have a priority tier at double the cost, but haven't seen any benchmarks on how much faster that actually is.
The flex tier was an underrated feature in GPT5, batch pricing with a regular API call. GPT5.1 using flex priority is an amazing price/intelligence tradeoff for non-latency sensitive applications, without needing to extra plumbing of most batch APIs
TacticalCoder|2 months ago
Turns out becoming a $4 trillion company first with ads (Google), then owning everybody on the AI-front could be the winning strategy.
seunosewa|2 months ago
kartayyar|2 months ago
https://github.com/Roblox/open-game-eval/blob/main/LLM_LEADE...
seany62|2 months ago
scrollop|2 months ago
https://artificialanalysis.ai/evaluations/omniscience
tallclair|2 months ago
giancarlostoro|2 months ago
toomuchtodo|2 months ago
outside1234|2 months ago
TacticalCoder|2 months ago
Markets seems to be in a: "Show me the OpenAI money" mood at the moment.
And even financial commentators who don't necessarily know a thing about AI can realize that Gemini 3 Pro and now Gemini 3 Flash are giving ChatGPT a run for its money.
Oracle and Microsoft have other source of revenues but for those really drinking the OpenAI koolaid, including OpenAI itself, I sure as heck don't know what the future holds.
My safe bet however is that Google ain't going anywhere and shall keep progressing on the AI front at an insane pace.
guelo|2 months ago
This story also shows the market corruption of Google's monopolies, but a judge recently gave them his stamp of approval so we're stuck with it for the foreseeable future.
spaceman_2020|2 months ago
They always had the best talent, but with Brin at the helm, they also have someone with the organizational heft to drive them towards a single goal
jack_riminton|2 months ago
/s
mmaunder|2 months ago
yunohn|2 months ago
hexasquid|2 months ago
Waiting for Apple to say "sorry folks, bad year for iPhone"
Europas|2 months ago
All these announcements beat all the other models on most benchmarks and are then the best model yet. They can't see the future yet so they are not aware or care anyway that 2 weeks later someone says "hold my beer" and we get again better benchmark results from someone else.
Exhausting and exciting
behnamoh|2 months ago
I think it's bad naming on google's part. "flash" implies low quality, fast but not good enough. I get less negative feeling looking at "mini" models.
pietz|2 months ago
nemonemo|2 months ago
jauntywundrkind|2 months ago
I've been playing around with other models recently (Kimi, GPT Codex, Qwen, others) to try to better appreciate the difference. I knew there was a big price difference, but watching myself feeding dollars into the machine rather than nickles has also founded in me quite the reverse appreciation too.
I only assume "if you're not getting charged, you are the product" has to be somewhat in play here. But when working on open source code, I don't mind.
happyopossum|2 months ago
KoolKat23|2 months ago
Otherwise, if it's a short prompt or answer, SOTA (state of the art) model will be cheap anyway and id it's a long prompt/answer, it's way more likely to be wrong and a lot more time/human cost is spent on "checking/debugging" any issue or hallucination, so again SOTA is better.
esafak|2 months ago
tonyhart7|2 months ago
claude is coding model from the start but GPT is in more and more becoming coding model
Imustaskforhelp|2 months ago
I hope open source AI models catch up to gemini 3 / gemini 3 flash. Or google open sources it but lets be honest that google isnt open sourcing gemini 3 flash and I guess the best bet mostly nowadays in open source is probably glm or deepseek terminus or maybe qwen/kimi too.
Workaccount2|2 months ago
Pretty much every person in the first (and second) world is using AI now, and only small fraction of those people are writing software. This is also reflected in OAI's report from a few months ago that found programming to only be 4% of tokens.
epolanski|2 months ago
kqr|2 months ago
[1]: https://entropicthoughts.com/haiku-4-5-playing-text-adventur...
freedomben|2 months ago
samyok|2 months ago
unsupp0rted|2 months ago
jasonjmcghee|2 months ago
bovermyer|2 months ago
I have not worked with Sonnet enough to give an opinion there.
pplonski86|2 months ago
encroach|2 months ago
ZuoCen_Liu|2 months ago
tonymet|2 months ago
dfsegoat|2 months ago
...and all of that done without any GPUs as far as i know! [1]
[1] - https://www.uncoveralpha.com/p/the-chip-made-for-the-ai-infe...
(tldr: afaik Google trained Gemini 3 entirely on tensor processing units - TPUs)
poopiokaka|2 months ago
[deleted]
Sincere6066|2 months ago
[deleted]
moffkalast|2 months ago