top | item 44044044

Gemini 2.5: Our most intelligent models are getting even better

68 points| meetpateltech | 9 months ago |blog.google

21 comments

order
[+] cye131|9 months ago|reply
The new 2.5 Pro (05-06) definitely does not have any sort of meaningful 1 million context window, as many users have pointed out. It does not even remember to generate its reasoning block at 50k+ tokens.

Their new pro model seemed to just trade off fluid intelligence and creativity for performance on closed-end coding tasks (and hence benchmarks), which unfortunately seems to be a general pattern for LLM development now.

[+] Aeolun|9 months ago|reply
I think it’s pretty strange how time and time again I see the scores for other models go up, but when I actually use them it sucks, and then I go back to Claude.

It’s also nice Claude just doesn’t update until they have actual improvements to show.

[+] jacob019|9 months ago|reply
Claude is great for code, if pricy, but when it gets stuck I break out Gemini 2.5 pro. It's smarter, but wants to rewrite everything to be extremely vebose and defensive, introducing bugs and stupid comments. 2.5 flash is amazing for agentic work. Each frontier model has unique strengths.
[+] mchusma|9 months ago|reply
I strongly dislike the “updating of versions” whenever possible. Versions are rarely better in all ways, makes things harder. Just make it version 2.6.
[+] andrewstuart|9 months ago|reply
I love Gemini.

I just wish they’d give powerful options for getting files out of it.

I’m so sick of cutting and pasting.

It would be nice to git push and pull into AI Studio chats or SFTP.

[+] russfink|9 months ago|reply
Why don’t companies publish hashes of emitted answers so that we, eg teachers, could verify if the AI produced this result?
[+] perdomon|9 months ago|reply
Hashes of every answer to every question and every variation of that question? If that were possible, you’d still need to account for the extreme likelihood of the LLM providing a differently worded answer (it virtually always will). This isn’t how LLMs or hashing algorithms work. I think the answer is that teachers need to adjust to the changing technological landscape. It’s long overdue, and LLMs have almost ruined homework.
[+] haiku2077|9 months ago|reply
Ever heard of the meme:

"can I copy your homework?"

"yeah just change it up a bit so it doesn't look obvious you copied"

[+] evilduck|9 months ago|reply
Local models are possible and nothing in that area of development will ever publish a hash of their output. The huge frontier models are not reasonably self-hosted but for normal K-12 tasking a model that runs on a decent gaming computer is sufficient to make a teacher's job harder. Hell, a small model running on a newer phone from the last couple of years could provide pretty decent essay help.
[+] Atotalnoob|9 months ago|reply
There are the issues others mentioned, but also you could write something word for word of what an LLM says.

It’s statistically unlikely, but possible

[+] BriggyDwiggs42|9 months ago|reply
There’s an actual approach where you have the LLM generate patterns of slightly less likely words and then can detect it easily from years ago. They don’t want to do any of that stuff because cheating students are their users.
[+] dietr1ch|9 months ago|reply
I see the problem you face, but I don't think it's that easy. It seems you can rely on hashes being noisy and alter questions or answers a little bit to get around the LLM homework naughty list.
[+] staticman2|9 months ago|reply
It would be pretty trivial to paraphrase the output wouldn't it?
[+] fenesiistvan|9 months ago|reply
Change one character and the hash will not match anymore...
[+] silisili|9 months ago|reply
Just ctrl-f for an em dash and call it a day.