Consequently all your "deleted chats" might one day become public if someone manages to dump some tables off OpenAI's databases.
Maybe not today on its heyday, but who knows what happens in 20 years once OpenAI becomes Yahoo of AI, or loses much of its value, gets scrapped for parts and bought by less sophisticated owners.
It's better to regard that data as already public.
With how modern systems, languages, databases and file systems are designed, deletion often means "mark this as deleted" or "erase the location of this data". This is true on all possible levels of the stack, from hardware to high-level application frameworks.
Changing this would slow computers down massively. Just to give a few examples, backups would be prohibited, so would be garbage collection and all existing SSD drives. File systems would have to wipe data on unlink(), which would increase drive wear and turn operations which everybody assumed were O(1) for years into O(n), and existing software isn't prepared for that. Same with zeroing out memory pages, OSes would have to be redesigned to do it all at once when a process terminates, and we just don't know what the performance impact of that would be.
Or maybe it should be illegal to have a court order that the privacy of millions of people should be infringed? I’m with OpenAI on this one, regardless of their less than pure reasons. You don’t get to wiretap all of the US population, and that’s essentially what they are doing here.
They are preserving evidence in a lawsuit. If you are concerned, you can try petitioning the court to keep your data private. I don't know how that would go.
The concept of “deleted” is not black and white, it is a continuum (though I agree that this is a very soft delete). As a technical matter, it is surprisingly difficult and expensive to unrecoverably delete something with high assurance. Most deletes in real systems are much softer than people assume because it dramatically improves performance, scalability, and cost.
There have been many attempts to build e.g. databases that support deterministic hard deletes. Unfortunately, that feature is sufficiently ruinous to efficient software architecture that performance is extremely poor such that no one uses them.
girvo|9 months ago
I don't disagree, but that ship sailed at least 15+ years ago. Soft delete is the name of the game basically everywhere...
aranelsurion|9 months ago
Maybe not today on its heyday, but who knows what happens in 20 years once OpenAI becomes Yahoo of AI, or loses much of its value, gets scrapped for parts and bought by less sophisticated owners.
It's better to regard that data as already public.
eurekin|9 months ago
miki123211|9 months ago
With how modern systems, languages, databases and file systems are designed, deletion often means "mark this as deleted" or "erase the location of this data". This is true on all possible levels of the stack, from hardware to high-level application frameworks.
Changing this would slow computers down massively. Just to give a few examples, backups would be prohibited, so would be garbage collection and all existing SSD drives. File systems would have to wipe data on unlink(), which would increase drive wear and turn operations which everybody assumed were O(1) for years into O(n), and existing software isn't prepared for that. Same with zeroing out memory pages, OSes would have to be redesigned to do it all at once when a process terminates, and we just don't know what the performance impact of that would be.
Aeolun|9 months ago
amanaplanacanal|9 months ago
JKCalhoun|9 months ago
jandrewrogers|9 months ago
There have been many attempts to build e.g. databases that support deterministic hard deletes. Unfortunately, that feature is sufficiently ruinous to efficient software architecture that performance is extremely poor such that no one uses them.