top | item 46983427

(no title)

Aedelon | 18 days ago

Survey of 65+ papers on model collapse. Key finding from Dohmatob et al. (ICLR 2025): even 0.1% synthetic contamination in training data causes measurable degradation.

No major dataset (FineWeb, RedPajama, C4) currently filters for AI-generated content.

discuss

No comments yet.