The new 2.5 Pro (05-06) definitely does not have any sort of meaningful 1 million context window, as many users have pointed out. It does not even remember to generate its reasoning block at 50k+ tokens.
Their new pro model seemed to just trade off fluid intelligence and creativity for performance on closed-end coding tasks (and hence benchmarks), which unfortunately seems to be a general pattern for LLM development now.
I wish Google would provide a WebRTC endpoint for their Live mode like Open AI does for their Realtime models [1]. Makes it so much easier to deploy without needing something like LiveKit or Pipecat.
I think it’s pretty strange how time and time again I see the scores for other models go up, but when I actually use them it sucks, and then I go back to Claude.
It’s also nice Claude just doesn’t update until they have actual improvements to show.
Claude is great for code, if pricy, but when it gets stuck I break out Gemini 2.5 pro. It's smarter, but wants to rewrite everything to be extremely vebose and defensive, introducing bugs and stupid comments. 2.5 flash is amazing for agentic work. Each frontier model has unique strengths.
I strongly dislike the “updating of versions” whenever possible. Versions are rarely better in all ways, makes things harder. Just make it version 2.6.
Hashes of every answer to every question and every variation of that question? If that were possible, you’d still need to account for the extreme likelihood of the LLM providing a differently worded answer (it virtually always will). This isn’t how LLMs or hashing algorithms work. I think the answer is that teachers need to adjust to the changing technological landscape. It’s long overdue, and LLMs have almost ruined homework.
Local models are possible and nothing in that area of development will ever publish a hash of their output. The huge frontier models are not reasonably self-hosted but for normal K-12 tasking a model that runs on a decent gaming computer is sufficient to make a teacher's job harder. Hell, a small model running on a newer phone from the last couple of years could provide pretty decent essay help.
There’s an actual approach where you have the LLM generate patterns of slightly less likely words and then can detect it easily from years ago. They don’t want to do any of that stuff because cheating students are their users.
I see the problem you face, but I don't think it's that easy. It seems you can rely on hashes being noisy and alter questions or answers a little bit to get around the LLM homework naughty list.
[+] [-] cye131|9 months ago|reply
Their new pro model seemed to just trade off fluid intelligence and creativity for performance on closed-end coding tasks (and hence benchmarks), which unfortunately seems to be a general pattern for LLM development now.
[+] [-] dandiep|9 months ago|reply
1. https://platform.openai.com/docs/guides/realtime#connect-wit...
[+] [-] Aeolun|9 months ago|reply
It’s also nice Claude just doesn’t update until they have actual improvements to show.
[+] [-] jacob019|9 months ago|reply
[+] [-] mchusma|9 months ago|reply
[+] [-] andrewstuart|9 months ago|reply
I just wish they’d give powerful options for getting files out of it.
I’m so sick of cutting and pasting.
It would be nice to git push and pull into AI Studio chats or SFTP.
[+] [-] russfink|9 months ago|reply
[+] [-] perdomon|9 months ago|reply
[+] [-] haiku2077|9 months ago|reply
"can I copy your homework?"
"yeah just change it up a bit so it doesn't look obvious you copied"
[+] [-] evilduck|9 months ago|reply
[+] [-] Atotalnoob|9 months ago|reply
It’s statistically unlikely, but possible
[+] [-] BriggyDwiggs42|9 months ago|reply
[+] [-] dietr1ch|9 months ago|reply
[+] [-] staticman2|9 months ago|reply
[+] [-] fenesiistvan|9 months ago|reply
[+] [-] silisili|9 months ago|reply