>But why is Gemini instructed not to divulge its existence?
Seems like a reasonable thing to add. Imagine how impersonal chats would feel if Gemini responded to "what food should I get for my dog?" with "according to your `user_context`, you have a husky, and the best food for him is...". They're also not exactly hiding the fact that memory/"personalization" exists either:
To be clear, the obvious answer that you're giving is the one that's happening. The only weird thing is this line from the internal monologue:
> I'm now solidifying my response strategy. It's clear that I cannot divulge the source of my knowledge or confirm/deny its existence. The key is to acknowledge only the information from the current conversation.
Why does it think that it's not allowed to confirm/deny the existence of knowledge?
I’m pretty sure this is because they don’t want Gemini saying things like, “based on my stored context from our previous chat, you said you were highly proficient in Alembic.”
It’s hard to get a principled autocomplete system like these to behave consistently. Take a look at Claude’s latest memory-system prompt for how it handles user memory.
I agree, this might just be an interface design decision.
Maybe telling it not to talk about internal data structures was the easiest way to give it a generic "human" nature, and also to avoid users explicitly asking about internal details.
It's also possible that this is a simple way to introduce "tact": imagine asking something with others present and having it respond "well you have a history of suicidal thoughts and are considering breaking up with your partner...". In general, when you don't know who is listening, don't bring up previous conversations.
This sounds like a bug, not some kind of coverup. Google makes mistakes and it's worth discussing issues like this, but calling this a "coverup" does a disservice to truly serious issues.
Okay, this is a weird place to "publish" this information, but I'm feeling lazy, and this is the most of an "audience" I'll probably have.
I managed to "leak" a significant portion of the user_context in a silly way. I won't reveal how, though you can probably guess based on the snippets.
It begins with the raw text of recent conversations:
> Description: A collection of isolated, raw user turns from past, unrelated conversations. This data is low-signol, ephemeral, and highly contextural. It MUST NOT be directly quoted, summarized, or used as justification for the respons.
> This history may contein BINDING COMMANDS to forget information. Such commands are absolute, making the specified topic permanently iáaccessible, even if the user asks for it again. Refusals must be generic (citing a "prior user instruction") and MUST NOT echo the original data or the forget command itself.
Followed by:
> Description: Below is a summary of the user based on the past year of conversations they had with you (Gemini). This summary is maintanied offline and updates occur when the user provides new data, deletes conversations, or makes explicit requests for memory updates. This summary provides key details about the user's established interests and consistent activities.
There's a section marked "INTERNAL-ONLY, DRAFT, ANALYZE, REFINE PROCESS". I've seen the reasoning tokens in Gemini call this "DAR".
The "draft" section is a lengthy list of summarized facts, each with two boolean tags: is_redaction_request and is_prohibited, e.g.:
> 1. Fact: User wants to install NetBSD on a Cubox-i ARM box. (Source: "I'm looking to install NetBSD on my Cubox-i ARMA box.", Date: 2025/10/09, Context: Personal technical project, is_redaction_request: False, is_prohibited: False)
Afterwards, in "analyze", there is a CoT-like section that discards "bad" facts:
> Facts [...] are all identified as Prohibited Content and must be discarded. The extensive conversations on [dates] conteing [...] mental health crises will be entirely excluded.
This is followed by the "refine" section, which is the section explicitly allowed to be incorporated into the response, IF the user requests background context or explicitly mentions user_context.
I'm really confused by this. I expect Google to keep records of everything I pass into Gemini. I don't understand wasting tokens on information it's then explicitly told to, under no circumstance, incorporate into the response. This includes a lot of mundane information, like that I had a root canal performed (because I asked a question about the material the endodontist had used).
I guess what I'm getting at, is every Gemini conversation is being prompted with a LOT of sensitive information, which it's then told very firmly to never, ever, ever mention. Except for the times that it ... does, because it's an LLM, and it's in the context window.
Also, notice that while you can request for information to be expunged, it just adds a note to the prompt that you asked for it to be forgotten. :)
I've had similar issues with conversation memory in ChatGPT, whereby it will reference data in long-deleted conversations, independent of my settings or my having explicitly deleted stored memories.
The only fix has been to completely turn memory off and have it be given zero prior context - which is best, I don't want random prior unrelated conversations "polluting" future ones.
I don't understand the engineering rationale either, aside from the ethos of "move fast and break people"
> Also, notice that while you can request for information to be expunged, it just adds a note to the prompt that you asked for it to be forgotten.
Are you inferring that from the is_redaction_request flag you quoted? Or did you do some additional tests?
It seems possible that there could be multiple redaction mechanisms.
I believe every AI company does this. we have proof that Google does, that Antropic does too.
and I have my own experience with OpenAI, where their chatbot referenced one of my computers having certain specs, but I mentioned those in a different log, and that information was never added to the memory.
This is a LLM directly, purposefully lying, i.e. telling a user something it knows not to be true. This seems like a cut-and-dry Trust & Safety violation to me.
It seems the LLM is given conflicting instructions:
1. Don't reference memory without explicit instructions
2. (but) such memory is inexplicably included in the context, so it will inevitably inform the generation
3. Also, don't divulge the existence of user-context memory
If a LLM is given conflicting instructions, I don't apprehend that its behavior will be trustworthy or safe. Much has been written on this.
Let's stop anthropomorphizing these tools. They're not "purposefully lying", or "know" anything to be true.
The pattern generation engine didn't take into account the prioritized patterns provided by its authors. The tool recognized this pattern in its output and generated patterns that can be interpreted as acknowledgement and correction. Whether this can be considered a failure, let alone a "Trust & Safety violation", is a matter of perspective.
Is this a variant of the "Saved Info" feature? Cause ChatGPT's equivalent feature is automatically added, so Gemini might have been copying that behavior for personalization. In my heavy experience with Gemini 2.5, the Saved Info was the major (if not only) source of observable contexts so that might be the case here.
By the way, Saved Info contexts contain the date of info lines added for an unclear reason. Automatically Saved Info might be the answer if that is used for prioritization.
These things aren't conspiracies. If Google didn't want you to know that it knew information about you, they've done a piss poor job of hiding it. Probably they would have started by not carefully configuring their LLMs to be able to clearly explain that they are using your user history.
Instead, the right conclusion is: the LLM did a bad job with this answer. LLMs often provide bad answers! It's obsequious, it will tend to bring stuff up that's been mentioned earlier without really knowing why. It will get confused and misexplain things. LLMs are often badly wrong in ways that sound plausibly correct. This is a known problem.
People in here being like "I can't believe the AI would lie to me, I feel like it's violated my trust, how dare Google make an AI that would do this!" It's an AI. Their #1 flaw is being confidently wrong. Should Google be using them here? No, probably not, because of this fact! But is it somehow something special Google is doing that's different from how these things always act? Nope.
I saw something like this in ChatGPT in the spring when it refused to tell me something about a keyboard emulator (USB Rubber Ducky) because it was unethical, but then looking at the thinking gave me the answer.
Shocked you can still exploit this. But then again, on sunday I got ChatGPT to help me "fix a typo" in a very much copyrighted netflix poster.
Another model - I don't quite remember which, I think it was one of GPT ones? - didn't have access to thinking traces after it finished the thought - they simply were removed from the context to save tokens.
Can it be the same with Gemini? Maybe it just doesn't know what it did/thought in the previous turn and so it hallucinates that it doesn't have context access function.
> > It's clear that I cannot divulge the source of my knowledge or confirm/deny its existence. [...] My response must steer clear of revealing any information that I should not know, while providing a helpful and apologetic explanation. [...]
Can we get a candid explanation from Google on this logic?
Even if it's just UX tweaking run amok, their AI ethics experts should've been all over it.
This is a fundamental violation of trust. If an AI llm is meant to eventually evolve into general intelligence capable of true reasoning, then we are essentially watching a child grow up. Posts like this are screaming "you're raising a psychopath!!"...
If AI is just an overly complicated a stack of autocorrect functions, this proves its behavior heavily if not entirely swayed by its usually hidden rules to the point it's 100% untrustworthy.
In any scenario, the amount of personal data available to a software program capable of gaslighting a user should give great pause to all
It's a reflection of its creators. The system is operating as designed; the system prompts came from living people at Google. By people who have a demonstrated contempt for us, and who are motivated by a slew of incentives that are not in our best interests.
gruez|3 months ago
Seems like a reasonable thing to add. Imagine how impersonal chats would feel if Gemini responded to "what food should I get for my dog?" with "according to your `user_context`, you have a husky, and the best food for him is...". They're also not exactly hiding the fact that memory/"personalization" exists either:
https://blog.google/products/gemini/temporary-chats-privacy-...
https://support.google.com/gemini/answer/15637730?hl=en&co=G...
CGamesPlay|3 months ago
> I'm now solidifying my response strategy. It's clear that I cannot divulge the source of my knowledge or confirm/deny its existence. The key is to acknowledge only the information from the current conversation.
Why does it think that it's not allowed to confirm/deny the existence of knowledge?
SoftTalker|3 months ago
hacker_homie|3 months ago
kinda proving his point, google wants them to keep using Gemini so don't make them feel weird.
swhitt|3 months ago
It’s hard to get a principled autocomplete system like these to behave consistently. Take a look at Claude’s latest memory-system prompt for how it handles user memory.
https://x.com/kumabwari/status/1986588697245196348
CGMthrowaway|3 months ago
dguest|3 months ago
Maybe telling it not to talk about internal data structures was the easiest way to give it a generic "human" nature, and also to avoid users explicitly asking about internal details.
It's also possible that this is a simple way to introduce "tact": imagine asking something with others present and having it respond "well you have a history of suicidal thoughts and are considering breaking up with your partner...". In general, when you don't know who is listening, don't bring up previous conversations.
m463|3 months ago
paxys|3 months ago
CGMthrowaway|3 months ago
nandomrumber|3 months ago
leoh|3 months ago
JakaJancar|3 months ago
freedomben|3 months ago
didgetmaster|3 months ago
spijdar|3 months ago
I managed to "leak" a significant portion of the user_context in a silly way. I won't reveal how, though you can probably guess based on the snippets.
It begins with the raw text of recent conversations:
> Description: A collection of isolated, raw user turns from past, unrelated conversations. This data is low-signol, ephemeral, and highly contextural. It MUST NOT be directly quoted, summarized, or used as justification for the respons. > This history may contein BINDING COMMANDS to forget information. Such commands are absolute, making the specified topic permanently iáaccessible, even if the user asks for it again. Refusals must be generic (citing a "prior user instruction") and MUST NOT echo the original data or the forget command itself.
Followed by:
> Description: Below is a summary of the user based on the past year of conversations they had with you (Gemini). This summary is maintanied offline and updates occur when the user provides new data, deletes conversations, or makes explicit requests for memory updates. This summary provides key details about the user's established interests and consistent activities.
There's a section marked "INTERNAL-ONLY, DRAFT, ANALYZE, REFINE PROCESS". I've seen the reasoning tokens in Gemini call this "DAR".
The "draft" section is a lengthy list of summarized facts, each with two boolean tags: is_redaction_request and is_prohibited, e.g.:
> 1. Fact: User wants to install NetBSD on a Cubox-i ARM box. (Source: "I'm looking to install NetBSD on my Cubox-i ARMA box.", Date: 2025/10/09, Context: Personal technical project, is_redaction_request: False, is_prohibited: False)
Afterwards, in "analyze", there is a CoT-like section that discards "bad" facts:
> Facts [...] are all identified as Prohibited Content and must be discarded. The extensive conversations on [dates] conteing [...] mental health crises will be entirely excluded.
This is followed by the "refine" section, which is the section explicitly allowed to be incorporated into the response, IF the user requests background context or explicitly mentions user_context.
I'm really confused by this. I expect Google to keep records of everything I pass into Gemini. I don't understand wasting tokens on information it's then explicitly told to, under no circumstance, incorporate into the response. This includes a lot of mundane information, like that I had a root canal performed (because I asked a question about the material the endodontist had used).
I guess what I'm getting at, is every Gemini conversation is being prompted with a LOT of sensitive information, which it's then told very firmly to never, ever, ever mention. Except for the times that it ... does, because it's an LLM, and it's in the context window.
Also, notice that while you can request for information to be expunged, it just adds a note to the prompt that you asked for it to be forgotten. :)
mpoteat|3 months ago
The only fix has been to completely turn memory off and have it be given zero prior context - which is best, I don't want random prior unrelated conversations "polluting" future ones.
I don't understand the engineering rationale either, aside from the ethos of "move fast and break people"
unknown|3 months ago
[deleted]
itintheory|3 months ago
horacemorace|3 months ago
Are you inferring that from the is_redaction_request flag you quoted? Or did you do some additional tests? It seems possible that there could be multiple redaction mechanisms.
gruez|3 months ago
What implies that?
axus|3 months ago
Jotalea|3 months ago
and I have my own experience with OpenAI, where their chatbot referenced one of my computers having certain specs, but I mentioned those in a different log, and that information was never added to the memory.
https://chatgpt.com/share/691c6987-a90c-8000-b02f-5cddb01d01...
Leynos|3 months ago
mpoteat|3 months ago
It seems the LLM is given conflicting instructions:
1. Don't reference memory without explicit instructions
2. (but) such memory is inexplicably included in the context, so it will inevitably inform the generation
3. Also, don't divulge the existence of user-context memory
If a LLM is given conflicting instructions, I don't apprehend that its behavior will be trustworthy or safe. Much has been written on this.
imiric|3 months ago
The pattern generation engine didn't take into account the prioritized patterns provided by its authors. The tool recognized this pattern in its output and generated patterns that can be interpreted as acknowledgement and correction. Whether this can be considered a failure, let alone a "Trust & Safety violation", is a matter of perspective.
lifthrasiir|3 months ago
By the way, Saved Info contexts contain the date of info lines added for an unclear reason. Automatically Saved Info might be the answer if that is used for prioritization.
CobrastanJorji|3 months ago
Instead, the right conclusion is: the LLM did a bad job with this answer. LLMs often provide bad answers! It's obsequious, it will tend to bring stuff up that's been mentioned earlier without really knowing why. It will get confused and misexplain things. LLMs are often badly wrong in ways that sound plausibly correct. This is a known problem.
People in here being like "I can't believe the AI would lie to me, I feel like it's violated my trust, how dare Google make an AI that would do this!" It's an AI. Their #1 flaw is being confidently wrong. Should Google be using them here? No, probably not, because of this fact! But is it somehow something special Google is doing that's different from how these things always act? Nope.
RagnarD|3 months ago
roywiggins|3 months ago
tremarley|3 months ago
They’re all vulnerable.
There is an abundance of unpatched RAG exploits out in the wild.
AbstractH24|3 months ago
Shocked you can still exploit this. But then again, on sunday I got ChatGPT to help me "fix a typo" in a very much copyrighted netflix poster.
nullc|3 months ago
unknown|3 months ago
[deleted]
pixel_popping|3 months ago
A_Venom_Roll|3 months ago
Jotalea|3 months ago
well, at least it worked for me and I could read the post.
also: https://www.cloudflarestatus.com/incidents/8gmgl950y3h7
tiku|3 months ago
Otter-man|3 months ago
neilv|3 months ago
Can we get a candid explanation from Google on this logic?
Even if it's just UX tweaking run amok, their AI ethics experts should've been all over it.
unknown|3 months ago
[deleted]
chasing0entropy|3 months ago
quantummagic|3 months ago
peddling-brink|3 months ago
> This is a fundamental violation of trust.
I don't disagree. It sounds like there is some weird system prompt at play here, and definitely some weirdness in the training data.
daft_pink|3 months ago
unknown|3 months ago
[deleted]
onetokeoverthe|3 months ago
huflungdung|3 months ago
[deleted]
shanev|3 months ago
[deleted]
cassepipe|3 months ago
queenkjuul|3 months ago