top | item 46983920

(no title)

Aedelon | 18 days ago

Both angles are real but they play out differently. On the deliberate side: Nightshade showed you can poison image models with a few hundred modified samples. Backdoor attacks on LLMs (sleeper agents, trojan triggers) are an active research area, and the attack surface is huge because most training pipelines just scrape the open web. So yes, someone generating garbage on purpose can cause targeted damage, especially if they understand how the data gets collected.

But the scarier part is that nobody needs to try. The accidental contamination is already happening. Models train on web data, produce outputs that end up on the web, next generation trains on that. Dohmatob et al. showed 0.1% synthetic contamination is enough to cause measurable degradation. Right now no major dataset (FineWeb, RedPajama, C4) filters for AI-generated content.

What makes this harder to think about: data quality and model performance don't always follow "garbage in, garbage out." I wrote about a related paradox where Qwen2.5-Math trained with deliberately wrong reward signals still improved almost as much as with correct ones: https://ai.gopubby.com/false-rewards-make-ai-smarter-paradox...

Models are simultaneously fragile to recursive contamination and weirdly resilient to corrupted training signals. The picture is messier than either side suggests.

discuss

No comments yet.