top | item 46533392

(no title)

flipbrad | 1 month ago

"we built foundational protections (...) including (...) training our models not to retain personal information from user chats"

Can someone please ELI5 - why is this a training issue, rather than basic design? How does one "train" for this?

discuss

dust42|1 month ago

This is just marketing nonsense. You don't have to train models to not retain personal information. They simply have no memory. In order to have a chat with an LLM, every time the whole conversation history gets reprocessed - it is not just the last answer / question gets send to the LLM but all preceding back and forth.

But what they do is exfiltrate facts and emotions from your chats to create a profile of you and feed it back into future conversations to make it more engaging and give it a personal feeling. This is intentionally programmed.

kgeist|1 month ago

I think they mean that they trained the tool-calling capabilities to skip personal information in tool call arguments (for RAG), or something like that. You need to intentionally train it to skip certain data.

>every time the whole conversation history gets reprocessed

Unless they're talking about the memory feature, which is some kind of RAG that remembers information between conversations.

_flux|1 month ago

> In order to have a chat with an LLM, every time the whole conversation history gets reprocessed - it is not just the last answer / question gets send to the LLM but all preceding back and forth.

Btw, context caching can overcome this, e.g. https://ai.google.dev/gemini-api/docs/caching . However, this means it needs to persist the (large) state in the server side, so it may have costs associated to it.

ipnon|1 month ago

I used to work in healthtech. Information that can be used to identify a person is regulated in America under the Health Insurance Portability and Accountability Act (HIPAA). These regulations are much stricter than the free-for-all that constitutes usage of information in companies that are dependent on ad networks. These regulations are strict and enforceable, so a healthcare company would be fined for failing to protect HIPAA data. OpenAI isn't a healthcare provider yet, but I'm guessing this is the framework they're basing their data retention and protection around for this new app.

bo1024|1 month ago

Same question. I wonder if they use ML to try to classify a chat as health information and not add it to their training data in that case.

I also wonder what the word "foundational" is supposed to mean here.

SAI_Peregrinus|1 month ago

I assume they want to retain all other info from user chats, and they're using an LLM to classify the info as "personal" or not.

data-ottawa|1 month ago

Could be telling the memory feature not to remember these specific details