top | item 46303975

(no title)

zurfer | 2 months ago

It's a cool release, but if someone on the google team reads that: flash 2.5 is awesome in terms of latency and total response time without reasoning. In quick tests this model seems to be 2x slower. So for certain use cases like quick one-token classification flash 2.5 is still the better model. Please don't stop optimizing for that!

discuss

edvinasbartkus|2 months ago

Did you try setting thinkingLevel to minimal?

thinkingConfig: { thinkingLevel: "low", }

More about it here https://ai.google.dev/gemini-api/docs/gemini-3#new_api_featu...

zurfer|2 months ago

Yes I tried it with minimal and it's roughly 3 seconds for prompts that take flash 2.5 1 second.

On that note it would be nice to get these benchmark numbers based on the different reasoning settings.

retropragma|2 months ago

That's more of a flash-lite thing now, I believe

Tiberium|2 months ago

You can still set thinking budget to 0 to completely disable reasoning, or set thinking level to minimal or low.

andai|2 months ago

>You cannot disable thinking for Gemini 3 Pro. Gemini 3 Flash also does not support full thinking-off, but the minimal setting means the model likely will not think (though it still potentially can). If you don't specify a thinking level, Gemini will use the Gemini 3 models' default dynamic thinking level, "high".

https://ai.google.dev/gemini-api/docs/thinking#levels

bobviolier|2 months ago

This might also have to do with it being a preview, and only available on the global region?