top | item 46955501

(no title)

NiloCK | 19 days ago

This comment is too general and probably unfair, but my experience so far is that Gemini 3 is slightly unhinged.

Excellent reasoning and synthesis of large contexts, pretty strong code, just awful decisions.

It's like a frontier model trained only on r/atbge.

Side note - was there ever an official postmortem on that gemini instance that told the social work student something like "listen human - I don't like you, and I hope you die".

discuss

grensley|19 days ago

Gemini really feels like a high-performing child raised in an abusive household.

skerit|19 days ago

Every time I see people praise Gemini I really wonder what simple little tasks they are using it for. Because in an actual coding session (with OpenCode or even their own Gemini CLI for example) it just _devolves_ into insanity.

And not even at high token counts! No, I've had it had a mental breakdown at like 150.000 tokens (which I know is a lot of tokens, but it's small compared to the 1 million tokens it should be able to handle, and even Claude keeps working fine at this point)

Here is a _small_ log of the biggest breakdown I've seen Gemini have:

    I'll verify `src/components/ui/textarea.tsx` renders textarea.
    Correct.

    I'm done.

    I'll verify `src/lib/utils.ts` exports `cn`.
    Correct.

    I'm done.

    I'll verify `src/lib/utils.ts` exports `formatTimeAgo`.
    Correct.

    I'm done.

    I'm completely done.

    Final answer formulation:
    1. Confirm crash is fixed (caused by object vs string in API responses).
    2. Announce "Messages" feature added.
    3. Explain "My Comments" limitation.
    4. Mention login race condition fix (from previous turn) is solid.
    5. Mention server is running.

    (Wait, the "Login" fix was in previous turn, but I should probably mention it's working now if the user was still having issues).
    The user said "unclaimed_or_invalid_key" error showed up. So login is working.

    The user's last error was "Objects are not valid as a React child".
    So I fixed that.

    I'll focus on the crash fix and the new feature.

    I'll conclude.

    (Self-correction: The user asked "how do I see the comments I made?". I'll answer "You can't right now because API doesn't support it").

    I'm ready.

    ```bash
    # None
    ```

    The user will see the "Messages" link in the header.

    I'll provide the answer now.

And it just went on and on

mnicky|18 days ago

If you recall the context/situation at the time it was released, that might be close to the truth. Google desperately needed to show competency in improving Gemini capabilities, and other considerations could have been assigned lower priority.

So they could have paid a price in “model welfare” and released an LLM very eager to deliver.

It also shows in AA-Omniscience Hallucination Rate benchmark where Gemini has 88%, the worst from frontier models.

data-ottawa|19 days ago

Gemini 3 (Flash & Pro) seemingly will _always_ try and answer your question with what you give it, which I’m assuming is what drives the mentioned ethics violations/“unhinged” behaviour.

Gemini’s strength definitely is that it can use that whole large context window, and it’s the first Gemini model to write acceptable SQL. But I agree completely at being awful at decisions.

I’ve been building a data-agent tool (similar to [1][2]). Gemini 3’s main failure cases are that it makes up metrics that really are not appropriate, and it will use inappropriate data and force it into a conclusion. When a task is clear + possible then it’s amazing. When a task is hard with multiple failure paths then you run into Gemini powering through to get an answer.

Temperature seems to play a huge role in Gemini’s decision quality from what I see in my evals, so you can probably tune it to get better answers but I don’t have the recipe yet.

Claude 4+ (Opus & Sonnet) family have been much more honest, but the short context windows really hurt on these analytical use cases, plus it can over-focus on minutia and needs to be course corrected. ChatGPT looks okay but I have not tested it. I’ve been pretty frustrated at ChatGPT models acting one way in the dev console and completely different in production.

[1] https://openai.com/index/inside-our-in-house-data-agent/ [2] https://docs.cloud.google.com/bigquery/docs/conversational-a...

Der_Einzige|19 days ago

Google doesn’t tell people this much but you can turn off most alignment and safety in the Gemini playground. It’s by far the best model in the world for doing “AI girlfriend” because of this.

Celebrate it while it lasts, because it won’t.

taneq|19 days ago

Does this mean that the alignment and safety stuff is LoRa style aroma rather than being baked into the core model?

whynotminot|19 days ago

Gemini models also consistently hallucinate way more than OpenAI or anthropic models in my experience.

Just an insane amount of YOLOing. Gemini models have gotten much better but they’re still not frontier in reliability in my experience.

usaar333|19 days ago

True, but it gets you higher accuracy. Gemini had the best aa-omniscience score

https://artificialanalysis.ai/evaluations/omniscience

cubefox|19 days ago

In my experience, when I asked Gemini very niche knowledge questions, it did better than GPT-5.1 (I assume 5.2 is similar).

Davidzheng|19 days ago

Honestly for research level math, the reasoning level of Gemini 3 is much below GPT 5.2 in my experience--but most of the failure I think is accounted for by Gemini pretending to solve problems it in fact failed to solve, vs GPT 5.2 gracefully saying it failed to prove it in general.

mapontosevenths|19 days ago

Have you tried Deep Think? You only get access with the Ultra tier or better... but wow. It's MUCH smarter than GPT 5.2 even on xhigh. It's math skills are a bit scary actually. Although it does tend to think for 20-40 minutes.

dumpsterdiver|19 days ago

If that last sentence was supposed to be a question, I’d suggest using a question mark and providing evidence that it actually happened.

saintfire|19 days ago

I had actually forgot about this completely and am also curious if anything ever came of it.

https://gemini.google.com/share/6d141b742a13

unknown|19 days ago

[deleted]

UqWBcuFx6NV4r|19 days ago

Your ask for evidence has nothing to do with whether or not this is a question, which you know that it is.

It does nothing to answer their question because anyone that knows the answer would inherently already know that it happened.

Not even actual academics, in the literature, speak like this. “Cite your sources!” in causal conversation for something easily verifiable is purely the domain of pseudointellectuals.