top | item 46301917

(no title)

fariszr | 2 months ago

These flash models keep getting more expensive with every release.

Is there an OSS model that's better than 2.0 flash with similar pricing, speed and a 1m context window?

Edit: this is not the typical flash model, it's actually an insane value if the benchmarks match real world usage.

> Gemini 3 Flash achieves a score of 78%, outperforming not only the 2.5 series, but also Gemini 3 Pro. It strikes an ideal balance for agentic coding, production-ready systems and responsive interactive applications.

The replacement for old flash models will be probably the 3.0 flash lite then.

discuss

order

thecupisblue|2 months ago

Yes, but the 3.0 Flash is cheaper, faster and better than 2.5 Pro.

So if 2.5 Pro was good for your usecase, you just got a better model for about 1/3rd of the price, but might hurt the wallet a bit more if you use 2.5 Flash currently and want an upgrade - which is fair tbh.

mark_l_watson|2 months ago

I agree, adding one point: a better model can in effect use fewer tokens if you get a higher percentage of successful one-shots to work. I am a ‘retired gentleman scientist’ so take this with a grain of salt (I do a lot of non-commercial, non-production experiments): when I watch the output for tool use, better models have fewer tool ‘re-tries.’

aoeusnth1|2 months ago

I think it's good, they're raising the size (and price) of flash a bit and trying to position Flash as an actually useful coding / reasoning model. There's always lite for people who want dirt cheap prices and don't care about quality at all.

sosodev|2 months ago

Nvidia released Nemotron 3 nano recently and I think it fits your requirements for an OSS model: https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B...

It's extremely fast on good hardware, quite smart, and can support up to 1m context with reasonable accuracy

mark_l_watson|2 months ago

I second this: I have spent about five hours this week experimenting with Nemotron 3 nano for both tool use and code analysis: it is excellent! and fast!

Relevant to the linked Google blog: I feel like getting Nemotron 3 nano and Gemini 3 flash in one week is an early Christmas gift. I have lived with the exponential improvements in practical LLM tools over the last three years, but this week seems special.

mips_avatar|2 months ago

For my apps evals Gemini flash and grok 4 fast are the only ones worth using. I'd love for an open weights model to compete in this arena but I haven't found one.

fullstackwife|2 months ago

cost of e2e task resolution should be cheaper, even if single inference cost is higher, you need fewer loops to solve a problem now

fariszr|2 months ago

Sure, but for simple tasks that require a large context window, aka the typical usecase for 2.0 flash, it's still significantly more expensive.