Gemini 2.5: Our most intelligent models are getting even better

[+] cye131|9 months ago|reply

The new 2.5 Pro (05-06) definitely does not have any sort of meaningful 1 million context window, as many users have pointed out. It does not even remember to generate its reasoning block at 50k+ tokens.

Their new pro model seemed to just trade off fluid intelligence and creativity for performance on closed-end coding tasks (and hence benchmarks), which unfortunately seems to be a general pattern for LLM development now.

[+] dandiep|9 months ago|reply

I wish Google would provide a WebRTC endpoint for their Live mode like Open AI does for their Realtime models [1]. Makes it so much easier to deploy without needing something like LiveKit or Pipecat.

1. https://platform.openai.com/docs/guides/realtime#connect-wit...

[+] Aeolun|9 months ago|reply

I think it’s pretty strange how time and time again I see the scores for other models go up, but when I actually use them it sucks, and then I go back to Claude.

It’s also nice Claude just doesn’t update until they have actual improvements to show.

[+] jacob019|9 months ago|reply

Claude is great for code, if pricy, but when it gets stuck I break out Gemini 2.5 pro. It's smarter, but wants to rewrite everything to be extremely vebose and defensive, introducing bugs and stupid comments. 2.5 flash is amazing for agentic work. Each frontier model has unique strengths.

[+] mchusma|9 months ago|reply

I strongly dislike the “updating of versions” whenever possible. Versions are rarely better in all ways, makes things harder. Just make it version 2.6.

[+] andrewstuart|9 months ago|reply

I love Gemini.

I just wish they’d give powerful options for getting files out of it.

I’m so sick of cutting and pasting.

It would be nice to git push and pull into AI Studio chats or SFTP.

[+] russfink|9 months ago|reply

Why don’t companies publish hashes of emitted answers so that we, eg teachers, could verify if the AI produced this result?

[+] perdomon|9 months ago|reply

Hashes of every answer to every question and every variation of that question? If that were possible, you’d still need to account for the extreme likelihood of the LLM providing a differently worded answer (it virtually always will). This isn’t how LLMs or hashing algorithms work. I think the answer is that teachers need to adjust to the changing technological landscape. It’s long overdue, and LLMs have almost ruined homework.

[+] haiku2077|9 months ago|reply

Ever heard of the meme:

"can I copy your homework?"

"yeah just change it up a bit so it doesn't look obvious you copied"

[+] evilduck|9 months ago|reply

Local models are possible and nothing in that area of development will ever publish a hash of their output. The huge frontier models are not reasonably self-hosted but for normal K-12 tasking a model that runs on a decent gaming computer is sufficient to make a teacher's job harder. Hell, a small model running on a newer phone from the last couple of years could provide pretty decent essay help.

[+] Atotalnoob|9 months ago|reply

There are the issues others mentioned, but also you could write something word for word of what an LLM says.

It’s statistically unlikely, but possible

[+] BriggyDwiggs42|9 months ago|reply

There’s an actual approach where you have the LLM generate patterns of slightly less likely words and then can detect it easily from years ago. They don’t want to do any of that stuff because cheating students are their users.

[+] dietr1ch|9 months ago|reply

I see the problem you face, but I don't think it's that easy. It seems you can rely on hashes being noisy and alter questions or answers a little bit to get around the LLM homework naughty list.

21 comments