It shows that models are limited in how much they can memorise (~3.6 bits per parameter), and once that threshold is reached, the model starts to generalise instead of memorise.
> However, they tend to memorize unwanted information, such as private or copyrighted content,
I mean humans don't forget copyrighted information. We just typically adjust it enough (some of the time) to avoid getting a copyright strike while modifying it in some way useful.
We don't forget 'private' information either. We might not tell other people that information, but it still influences our thoughts.
The idea of a world where we have AI minds forget vast amounts of information that humans have to deal with every day is concerning and dystopian to me.
I agree. As far as copyrighted and artistic works go, I've never fully understood what the objection is. If the work is being remixed not copied then it surely falls under fair use? Meanwhile, if it creates something new in an artist's style, it's only doing what talented imitators routinely do. There's the economic argument. But if that's accepted, then for fairness it would have to be extended to every other profession which stands to be wiped out by AI, which would be daft.
New works in familiar styles are something I can't wait for. The idea that the best Beethoven symphony hasn't been composed yet, or that the best Basquiat hasn't been painted yet, or that if the tech ever gets far enough, Game of Thrones might actually be done properly with the same actors, is a pretty mouthwatering prospect. Also styles we haven't discovered, that AI can anticipate. How's it to do that without a full understanding of culture? Hobbling the delight it could bring generally for the sake of protected classes will just make the tech less human and a lot less exciting.
I'd counter with an anecdote; I had a colleague that boasted how he memorized a classmate's SSN in college and would greet him by SSN when seeing him years later. Is the goal of AI to replicate the entirety of the human experience (including social pressures, norms, and shame) or a tool to complement human decision making?
While, yes, you can argue the slippery slope, it may be advantageous to flag certain training material as exempt. We as humans often make decisions without perfect knowledge, and "knowing more" isn't a guarantee that it produces better outcomes, given the types of information consumed.
JimDabell|8 months ago
> How much do language models memorize?
— https://arxiv.org/abs/2505.24832
— https://news.ycombinator.com/item?id=44171363
It shows that models are limited in how much they can memorise (~3.6 bits per parameter), and once that threshold is reached, the model starts to generalise instead of memorise.
pixl97|9 months ago
I mean humans don't forget copyrighted information. We just typically adjust it enough (some of the time) to avoid getting a copyright strike while modifying it in some way useful.
We don't forget 'private' information either. We might not tell other people that information, but it still influences our thoughts.
The idea of a world where we have AI minds forget vast amounts of information that humans have to deal with every day is concerning and dystopian to me.
unknown|9 months ago
[deleted]
squidbeak|8 months ago
New works in familiar styles are something I can't wait for. The idea that the best Beethoven symphony hasn't been composed yet, or that the best Basquiat hasn't been painted yet, or that if the tech ever gets far enough, Game of Thrones might actually be done properly with the same actors, is a pretty mouthwatering prospect. Also styles we haven't discovered, that AI can anticipate. How's it to do that without a full understanding of culture? Hobbling the delight it could bring generally for the sake of protected classes will just make the tech less human and a lot less exciting.
johnjreiser|9 months ago
While, yes, you can argue the slippery slope, it may be advantageous to flag certain training material as exempt. We as humans often make decisions without perfect knowledge, and "knowing more" isn't a guarantee that it produces better outcomes, given the types of information consumed.
Sleepthinker|8 months ago
[deleted]
curtisszmania|9 months ago
[deleted]