(no title)
opportune | 2 years ago
It's a lazy argument to imply this 1. invalidates the technological achievement 2. prevents iterative improvement a la the singularity. For one, the Internet itself is not bullshit just because a lot of spammers/hustlers put bad content on it to try to make money. And secondly, you can curate datasets... nothings stopping researchers from training LLMs on shitty SEO now, and if they wanted to they could curate datasets going forward to try to prevent LLM-spam from entering the training sets of future models.
And finally, people already use reputation/identity/branding and proxies for it as quality filters on the internet. For example, this is an unfamiliar blogger to me and so I entered it with skepticism I wouldn't have with people like Gwern or Lynn Alden. Good writing from people like Gwern and Lynn Alden won't disappear just because LLM content exists on the internet - it just makes reputations and identity (eg to a real human) more important.
rocqua|2 years ago
rebolek|2 years ago
opportune|2 years ago
horeszko|2 years ago
Its the needle in a haystack problem, where AI is used to increase the size of the haystack, making the needle (i.e. quality content) harder to find.
opportune|2 years ago
People choose to use push models for content through meta properties, tiktok, and aggregators like reddit and HN, but nothing is forcing them to. If they push enough bad content, people won't keep using them. Already happened with Facebook and Reddit predecessors, probably happening to Reddit now.
It doesn't matter how big the haystack is when you have the ability to go directly to the needle.