top | item 47136975

(no title)

veunes | 5 days ago

If just 16 million examples were enough to significantly boost model quality (as Anthropic claims), it turns out that data quality beats quantity

Instead of vacuuming petabytes of trash from Common Crawl, you can just take high-quality distillate from a SOTA model and get comparable results. Bad news for anyone betting solely on massive compute clusters and closed datasets

discuss

order

No comments yet.