"The New York Times is demanding that we turn over 20 million of your private ChatGPT conversations."
As might any plaintiff. NYT might be the first of many others and the lawsuits may not be limited to copyright claims
Why has OpenAI collected and stored 20 million conversations (including "deleted chats")
What is the purpose of OpenAI storing millions of private conversations
By contrast the purpose of NYT's request is both clear and limited
The documents requested are not being made public by the plaintiffs. The documents will presumably be redacted to protect any confidential information before being produced to the plaintiffs, the documents can only be used by the plaintiffs for the purpose of the litigation against OpenAI and, unlike OpenAI who has collected and stored these conversations for as long as OpenAI desires, the plaintiffs are prohibited from retaining copies of the documents after the litigation is concluded
The privacy issue here has been created by OpenAI for their own commercial benefit
It is not even clear what this benefit, if any, will be as OpenAI continues to search for a "business model"
> What is the purpose of OpenAI storing millions of private conversations
Your previous ChatGPT conversations show up right in the ChatGPT interface.
They have to store the private conversations to enable users to bring them up in the interface.
This isn't a secretive, hidden data collection. It's a clear and obvious feature right in the product. They're fighting for the ability to not retain secret records of past conversations that have been deleted.
The problem with the court order is that it requires them to keep the conversations even after a user presses the 'Delete' button on them.
> The documents requested are not being made public by the plaintiffs
In fact, as far as I understand it, they could not be made public by the plaintiffs even if they wanted to do so, or even if one of their employees decided to leak them.
That's because the plaintiffs themselves never actually see the documents. They will only be seen by the plaintiff's lawyers and any experts hired by those lawyers to analyze them.
"OpenAI has failed to explain how its consumers privacy rights are not adequately protected by: (1) the existing protective order in this multidistrict litigation or (2) OpenAIs exhaustive de-identification of all of the 20 million Consumer ChatGPT Logs.1
1. As News Plaintiffs point out, OpenAI has spent the last two and a half months processing and deidentifying this 20 million record sample. (ECF 719 at 1 n.1)."
If an analogy to the history of search engines can be made,^1 then we know that log retention policies in the US can change over time. The user has no control over such changes
Companies operating popular www search engines might claim that the need for longer retention is "to provide better service" or some similar reason that focuses on users' interests rather than the company's interests^2
2. Generally, advertising services
This paper attempts to expose such claims as bogus
1. According to some reports OpenAI is sending some queries to Google
Instead of asking, "What is the purpose of OpenAI storing milllions of private conversations" and having HN commenters (mis)interpret this as something other than a rhetorical question, one could ask, "What are the consequences for users of OpenAI storing millions of private conversations that users do not wish to save"
HN replies might try to answer this as well but the answer is already known to the world
The conversations will be made available to the plaintiffs' (including New York Times') attorneys and the plaintiffs' attorneys' experts
If OpenAI did not store such conversations as a matter of practice before being sued, then there would be no private conversations to make available to the plaintiffs' attorneys and their experts
275 upvotes
AFAICT, most HN readers did _not_ misintepret the question
HN replies != HN, it is a small subset of the readership
Is there a technical limitation that prevents chat histories from being stored locally on the user's computer instead of being stored on someone else's computer(s)
Why do chat histories need to be accessible by OpenAI, its service partners and anyone with the authority to request them from OpenAI
If users want this design, as suggested by HN commenters, if users want their chat histories to be accessible to OpenAI, its service providers and anyone with authority to request them from OpenAI, then wouldn't it also be true that these users are not much concerned with "privacy"
If so, then why would OpenAI proclaim they are "fighting the New York Times' invasion of user privacy", knowing that NYT is prohibited from making the logs public and users generally do not care much about "privacy" anyway
The restrictions on plaintiff NYT's use of the logs are greater than the restrictions, if any,^1 on OpenAI's use of them
1. If any such restrictions existed, for example if OpenAI stated "We don't do X" in a "privacy policy" and people interpreted this as a legally enforceable restriction,^2 how would a user verify that the statement was true, i.e., that OpenAI has not violated the "restriction". Silicon Valley companies like OpenAI are highly secretive
2. As opposed to a statement by OpenAi of what OpenAI allegedly does not do. Compare with a potentially legally-enforceable promise such as "OpenAI will not do X". Also consider that OpenAI may do Y, Z, etc. and make no mention of it to anyone. As it happens Silicon Valley companies generally have a reputation for dishonesty
I wouldn't want to make it out like I think OpenAI is the good guy here. I don't.
But conversations people thought they were having with OpenAI in private are now going to be scoured by the New York Times' lawyers. I'm aware of the third party doctrine and that if you put something online it can never be actually private. But I think this also runs counter to people's expectations when they're using the product.
In copyright cases, typically you need to show some kind of harm. This case is unusual because the New York Times can't point to any harm, so they have to trawl through private conversations OpenAI's customers have had with their service to see if they can find any.
I've noticed a pattern of companies writing their customers open letters asking them to do their contract negotiations for them. First it was ESPN vs. YouTube (not watching MNF this week was the best 3 hours I've ever saved, sorry advertisers). Now it's OpenAI vs. The New York Times.
Little do they know that I care very little for either party and enjoy seeing both of them squirm. You went to business school, not me. Work it out.
In this case, it's awfully suspicious that OpenAI is worried about The New York Times finding literal passages in their articles that ChatGPT spits out verbatim. If your AI doesn't do that, like you say, then why would it be a problem to check?
Finally, both parties should find a neutral third party. The neutral third party gets the full text of every NYT article and ChatGPT transcript, and finds the matches. NYT doesn't get ChatGPT transcripts. OpenAI doesn't get the full text of every NYT article (even though they have to already have that). Everyone is happy. If OpenAI did something illegal, the court can find out. If they didn't, then they're safe. I think it would be very fair.
(I take the side of neither party. I'm not a huge fan of training language models on content that wasn't licensed for that purpose. And I'm not a huge fan of The NYT's slide to the right as they cheerlead the end of the American experiment.)
> Finally, both parties should find a neutral third party.
That's next to impossible. And if that party fails to be neutral you've just generated a new lawsuit entangled with this one.
The current procedure is each side gets their own expert. The two expert can duke it out and the crucible of the courtroom decides who was more credible.
This screams just as genuine as Google saying anything about Privacy.
Both companies are clearly wrong here. There is a small part of me that kinda wants openai to loose this, just so maybe it will be a wake up call to people putting in way too personal of information into these services? Am I too hopeful here that people will learn anything...
Fundamentally I agree with what they are saying though, just don't find it genuine in the slightest coming from them.
Its clearly propaganda. "Your data belongs to you." I'm sure the ToS says otherwise, as OpenAI likely owns and utilizes this data. Yes, they say they are working on end-to-end encryption (whatever that means when they control one end), but that is just a proposal at this point.
Also their framing of the NYT intent makes me strongly distrust anything they say. Sit down with a third party interviewer who asks challenging questions, and I'll pay attention.
I got one sentence in and thought to myself, "This is about discovery, isn't it?"
And lo, complaints about plaintiffs started before I even had to scroll. If this company hadn't willy-nilly done everything they could to vacuum up the world's data, wherever it may be, however it may have been protected, then maybe they wouldn't be in this predicament.
Ironically there is precedent of Google caring more about this. When they realized location timeline was a gigantic fed honeypot, they made it per-device, locally stored only. No open letters were written in the process of.
It ridiculous for OpenAI to attempt to claim some moral high-ground here. They're a company that has demonstrated zero respect for the copyright or data privacy regulations of other organisations. I think they take users dignity and rights with a grain of salt.
Their statements are all aspirational, "we're working toward de-identifying" etc. They've built one of the most powerful AIs ever seen and now they're claiming it's difficult to delete, de-identify / anonymize. Maybe they should ask their AI to do it :-)
It's impossible to take this company seriously. They're nothing but a carny barker stealing everything of value that they can lay their (creepy) hands on.
So why aren’t they offering for an independent auditor to come into OpenAI and inspect their data (without taking it outside of OpenAI’s systems)?
Probably because they have a lot to hide, a lot to lose, and no interest in fair play.
Theoretically, they could prove their tools aren’t being used to doing anything wrong but practically, we all know they can’t because they are actually in the wrong (in both the moral and, IMO though IANAL, the legal sense). They know it, we know it, the only problem is breaking the ridiculous walled garden that stops the courts from ‘knowing’ it.
By the same token, why isn't NYT proposing something like that rather than the world's largest random sampling?
You don't have to think that OpenAI is good to think there's a legitimate issue over exposing data to a third party for discovery. One could see the Times discovering something in private conversations outside the scope of the case, but through their own interpretation of journalistic necessity, believe it's something they're obligated to publish.
Part of OpenAI holding up their side of the bargain on user data, to the extent they do, is that they don't roll over like a beaten dog to accommodate unconditional discovery requests.
remember a corporation generally is an object owned by some people. Do you trust "unspecified future group of people" with your privacy? You can't. Best we can do is understand the information architecture and act accordingly.
Please correct me if I am wrong, but couldn't OpenAi just encrypt every conversation before saving them?
With each query to the model the full conversation is fed into the model again, so I guess there is no technical need to store them unencrypted. Unless, of course, OpenAi wants to analyze the chats.
The way I see it, the problem is that OpenAI employees can look at the chats and the fact that some NYT lawyer can look at it doesn't make me more uncomfortable.
Insane argumentation. It's like saying an investigator with a court-order should not be allowed to look at stored copies of letters, although the company sending those letters a) looks at them regularly b) stores these copies in the first place.
When I looked for the base of this lawsuit, I was looking for some kind of monetary damage that the New York Times had suffered as a result of open AI's actions, like specific cases where their work has been reproduced or people canceling their subscriptions to the New York Times because of OpenAI's launch. I've done so much reading, and I've still been unable to find anything that articulates this. Do you know of anything that talks about it?
Standard tech scaling playbook, page 69420: there is a function f(x) whereby if you're growing fast enough, you can ignore the laws, then buy the regulators. This is called "The Uber Curve"
Why should OpenAI keep those conversations in the first point? (of course the answer is obvious) If they didn't keep them, they wouldn't have anything to hand over, and they would have protected users' privacy MUCH better. This is just as good as Facebook or Google care about their users' privacy.
Wondering if anyone here has a good answer to this:
what protection does user data typically have during legal discovery in a civil suit like this where the defendant is a service provider but relevant evidence is likely present in user data?
Does a judge have to weigh a users' expectation of privacy against the request? Do terms of service come into play here (who actually owns the data? what privacy guarantees does the company make?).
I'm assuming in this case that the request itself isn't overly broad and seems like a legitimate use of the discovery process.
This problem wouldn't exist if openai wouldn't store chatlogs (which of course they want to do, so that they can train on that data to improve the models). But calling nyt the bad guy here is simply wrong because it's not strictly necessary to store that data at all, and if you do, there will always be a risk of others getting access to it.
[+] [-] 1vuio0pswjnm7|4 months ago|reply
As might any plaintiff. NYT might be the first of many others and the lawsuits may not be limited to copyright claims
Why has OpenAI collected and stored 20 million conversations (including "deleted chats")
What is the purpose of OpenAI storing millions of private conversations
By contrast the purpose of NYT's request is both clear and limited
The documents requested are not being made public by the plaintiffs. The documents will presumably be redacted to protect any confidential information before being produced to the plaintiffs, the documents can only be used by the plaintiffs for the purpose of the litigation against OpenAI and, unlike OpenAI who has collected and stored these conversations for as long as OpenAI desires, the plaintiffs are prohibited from retaining copies of the documents after the litigation is concluded
The privacy issue here has been created by OpenAI for their own commercial benefit
It is not even clear what this benefit, if any, will be as OpenAI continues to search for a "business model"
Wanton data collection
[+] [-] 1vuio0pswjnm7|4 months ago|reply
https://ia801404.us.archive.org/31/items/gov.uscourts.nysd.6...
https://ia801404.us.archive.org/31/items/gov.uscourts.nysd.6...
[+] [-] Aurornis|4 months ago|reply
Your previous ChatGPT conversations show up right in the ChatGPT interface.
They have to store the private conversations to enable users to bring them up in the interface.
This isn't a secretive, hidden data collection. It's a clear and obvious feature right in the product. They're fighting for the ability to not retain secret records of past conversations that have been deleted.
The problem with the court order is that it requires them to keep the conversations even after a user presses the 'Delete' button on them.
[+] [-] silveraxe93|4 months ago|reply
- [1] https://arstechnica.com/tech-policy/2025/08/openai-offers-20...
[+] [-] macki0|4 months ago|reply
Its needed for the conversation history feature, a core feature of the ChatGPT product
Its like saying "What is the purpose of Google Photos storing millions of private images"
[+] [-] cush|4 months ago|reply
Have you used ChatGPT? Your conversation history is on the left rail
[+] [-] tzs|4 months ago|reply
In fact, as far as I understand it, they could not be made public by the plaintiffs even if they wanted to do so, or even if one of their employees decided to leak them.
That's because the plaintiffs themselves never actually see the documents. They will only be seen by the plaintiff's lawyers and any experts hired by those lawyers to analyze them.
[+] [-] 1vuio0pswjnm7|4 months ago|reply
https://ia801205.us.archive.org/1/items/gov.uscourts.nysd.61...
OpenAI October 30, 2025 Letter Opposing Motion to Compel
https://ia601205.us.archive.org/1/items/gov.uscourts.nysd.61...
November 7, 2025 Order on Motion to Compel
https://ia601205.us.archive.org/1/items/gov.uscourts.nysd.61...
"OpenAI has failed to explain how its consumers privacy rights are not adequately protected by: (1) the existing protective order in this multidistrict litigation or (2) OpenAIs exhaustive de-identification of all of the 20 million Consumer ChatGPT Logs.1
1. As News Plaintiffs point out, OpenAI has spent the last two and a half months processing and deidentifying this 20 million record sample. (ECF 719 at 1 n.1)."
[+] [-] 1vuio0pswjnm7|4 months ago|reply
https://ide.mit.edu/wp-content/uploads/2018/01/w23815.pdf
Companies operating popular www search engines might claim that the need for longer retention is "to provide better service" or some similar reason that focuses on users' interests rather than the company's interests^2
2. Generally, advertising services
This paper attempts to expose such claims as bogus
1. According to some reports OpenAI is sending some queries to Google
[+] [-] unknown|4 months ago|reply
[deleted]
[+] [-] 1vuio0pswjnm7|3 months ago|reply
HN replies might try to answer this as well but the answer is already known to the world
The conversations will be made available to the plaintiffs' (including New York Times') attorneys and the plaintiffs' attorneys' experts
If OpenAI did not store such conversations as a matter of practice before being sued, then there would be no private conversations to make available to the plaintiffs' attorneys and their experts
275 upvotes
AFAICT, most HN readers did _not_ misintepret the question
HN replies != HN, it is a small subset of the readership
[+] [-] amypetrik8|4 months ago|reply
To train the AI further. Obviously. Simple as.
[+] [-] 1vuio0pswjnm7|3 months ago|reply
[+] [-] 1vuio0pswjnm7|4 months ago|reply
Why do chat histories need to be accessible by OpenAI, its service partners and anyone with the authority to request them from OpenAI
If users want this design, as suggested by HN commenters, if users want their chat histories to be accessible to OpenAI, its service providers and anyone with authority to request them from OpenAI, then wouldn't it also be true that these users are not much concerned with "privacy"
If so, then why would OpenAI proclaim they are "fighting the New York Times' invasion of user privacy", knowing that NYT is prohibited from making the logs public and users generally do not care much about "privacy" anyway
The restrictions on plaintiff NYT's use of the logs are greater than the restrictions, if any,^1 on OpenAI's use of them
1. If any such restrictions existed, for example if OpenAI stated "We don't do X" in a "privacy policy" and people interpreted this as a legally enforceable restriction,^2 how would a user verify that the statement was true, i.e., that OpenAI has not violated the "restriction". Silicon Valley companies like OpenAI are highly secretive
2. As opposed to a statement by OpenAi of what OpenAI allegedly does not do. Compare with a potentially legally-enforceable promise such as "OpenAI will not do X". Also consider that OpenAI may do Y, Z, etc. and make no mention of it to anyone. As it happens Silicon Valley companies generally have a reputation for dishonesty
[+] [-] wkat4242|4 months ago|reply
I'm glad the NYT is fighting them. They've infringed the rights of almost every news outlet but someone has to bring this case.
[+] [-] rpdillon|4 months ago|reply
But conversations people thought they were having with OpenAI in private are now going to be scoured by the New York Times' lawyers. I'm aware of the third party doctrine and that if you put something online it can never be actually private. But I think this also runs counter to people's expectations when they're using the product.
In copyright cases, typically you need to show some kind of harm. This case is unusual because the New York Times can't point to any harm, so they have to trawl through private conversations OpenAI's customers have had with their service to see if they can find any.
It's quite literally a fishing expedition.
[+] [-] jrockway|4 months ago|reply
Little do they know that I care very little for either party and enjoy seeing both of them squirm. You went to business school, not me. Work it out.
In this case, it's awfully suspicious that OpenAI is worried about The New York Times finding literal passages in their articles that ChatGPT spits out verbatim. If your AI doesn't do that, like you say, then why would it be a problem to check?
Finally, both parties should find a neutral third party. The neutral third party gets the full text of every NYT article and ChatGPT transcript, and finds the matches. NYT doesn't get ChatGPT transcripts. OpenAI doesn't get the full text of every NYT article (even though they have to already have that). Everyone is happy. If OpenAI did something illegal, the court can find out. If they didn't, then they're safe. I think it would be very fair.
(I take the side of neither party. I'm not a huge fan of training language models on content that wasn't licensed for that purpose. And I'm not a huge fan of The NYT's slide to the right as they cheerlead the end of the American experiment.)
[+] [-] themafia|4 months ago|reply
That's next to impossible. And if that party fails to be neutral you've just generated a new lawsuit entangled with this one.
The current procedure is each side gets their own expert. The two expert can duke it out and the crucible of the courtroom decides who was more credible.
[+] [-] baggachipz|4 months ago|reply
let them fight.
[+] [-] hlieberman|4 months ago|reply
[+] [-] nerdjon|4 months ago|reply
Both companies are clearly wrong here. There is a small part of me that kinda wants openai to loose this, just so maybe it will be a wake up call to people putting in way too personal of information into these services? Am I too hopeful here that people will learn anything...
Fundamentally I agree with what they are saying though, just don't find it genuine in the slightest coming from them.
[+] [-] stevarino|4 months ago|reply
Also their framing of the NYT intent makes me strongly distrust anything they say. Sit down with a third party interviewer who asks challenging questions, and I'll pay attention.
[+] [-] 98codes|4 months ago|reply
And lo, complaints about plaintiffs started before I even had to scroll. If this company hadn't willy-nilly done everything they could to vacuum up the world's data, wherever it may be, however it may have been protected, then maybe they wouldn't be in this predicament.
[+] [-] stefan_|4 months ago|reply
[+] [-] outside1234|4 months ago|reply
[+] [-] meV1|3 months ago|reply
Their statements are all aspirational, "we're working toward de-identifying" etc. They've built one of the most powerful AIs ever seen and now they're claiming it's difficult to delete, de-identify / anonymize. Maybe they should ask their AI to do it :-)
It's impossible to take this company seriously. They're nothing but a carny barker stealing everything of value that they can lay their (creepy) hands on.
[+] [-] Apreche|4 months ago|reply
[+] [-] techblueberry|4 months ago|reply
[+] [-] EdNutting|4 months ago|reply
Probably because they have a lot to hide, a lot to lose, and no interest in fair play.
Theoretically, they could prove their tools aren’t being used to doing anything wrong but practically, we all know they can’t because they are actually in the wrong (in both the moral and, IMO though IANAL, the legal sense). They know it, we know it, the only problem is breaking the ridiculous walled garden that stops the courts from ‘knowing’ it.
[+] [-] glenstein|4 months ago|reply
You don't have to think that OpenAI is good to think there's a legitimate issue over exposing data to a third party for discovery. One could see the Times discovering something in private conversations outside the scope of the case, but through their own interpretation of journalistic necessity, believe it's something they're obligated to publish.
Part of OpenAI holding up their side of the bargain on user data, to the extent they do, is that they don't roll over like a beaten dog to accommodate unconditional discovery requests.
[+] [-] terminalshort|4 months ago|reply
That is proving a negative. You are never required to prove a negative.
> the only problem is breaking the ridiculous walled garden that stops the courts from ‘knowing’ it.
The "problem" of privacy?
[+] [-] mac3n|4 months ago|reply
-- openai
[+] [-] nrhrjrjrjtntbt|4 months ago|reply
remember a corporation generally is an object owned by some people. Do you trust "unspecified future group of people" with your privacy? You can't. Best we can do is understand the information architecture and act accordingly.
[+] [-] unknown|4 months ago|reply
[deleted]
[+] [-] great_wubwub|4 months ago|reply
-- openai, probably.
[+] [-] gk1|4 months ago|reply
[+] [-] frig57|4 months ago|reply
[+] [-] The-Ludwig|4 months ago|reply
The way I see it, the problem is that OpenAI employees can look at the chats and the fact that some NYT lawyer can look at it doesn't make me more uncomfortable. Insane argumentation. It's like saying an investigator with a court-order should not be allowed to look at stored copies of letters, although the company sending those letters a) looks at them regularly b) stores these copies in the first place.
[+] [-] nielsole|4 months ago|reply
[+] [-] NewsaHackO|4 months ago|reply
I am pretty sure this isn't true. They have to have some sort of K-V cache system to make continuing conversations cheaper.
[+] [-] stevage|4 months ago|reply
[+] [-] rpdillon|3 months ago|reply
[+] [-] plorg|4 months ago|reply
[+] [-] buellerbueller|4 months ago|reply
[+] [-] ale42|4 months ago|reply
[+] [-] nrhrjrjrjtntbt|4 months ago|reply
And what if they for example find evidence of X other thing such as:
1. Something useful for a story, maybe they follow up in parallel. Know who to interview and what to ask?
2. A crime.
3. An ongoing crime.
4. Something else they can sue someone else for.
5. Top secret information
[+] [-] grugagag|4 months ago|reply
[+] [-] pyrophane|4 months ago|reply
what protection does user data typically have during legal discovery in a civil suit like this where the defendant is a service provider but relevant evidence is likely present in user data?
Does a judge have to weigh a users' expectation of privacy against the request? Do terms of service come into play here (who actually owns the data? what privacy guarantees does the company make?).
I'm assuming in this case that the request itself isn't overly broad and seems like a legitimate use of the discovery process.
[+] [-] leumon|3 months ago|reply
[+] [-] bgwalter|4 months ago|reply
https://www.schneier.com/blog/archives/2025/06/what-llms-kno...
At some point they'll monetize these dossiers.