top | item 40899209

(no title)

If the internet is being filled with AI generative content post-2021, then doesn't that just imply that the next generation of AI training on this "slurry" would be analogous to a "multi-round fixing" operation (as quoted above)?

While currently this is a relatively weak strength of genAI - assuming technological improvement of this technique over time, isn't it just as possible that the data quality will converge positively rather than negatively over time, in the future? That is to say, the web would be consistently "refined" as time goes on, by predominant VLLM?

Assuming that the internet is even "filled" as you say in the first place (personally I don't think organically-generated content is ever going to be pushed out of the internet, but that's my opinion, and I'll entertain the opposite case for the sake of the discussion). It also assumes that people are using models trained on the current state of internet "slurry" in the first place - that we are continually ingesting more internet YoY into these models. If we come up with a better model that needs less data to produce high-quality content, neither my nor your assertion is even relevant. Same case if the internet just decides to use small, low-quality models trained on only a portion of the internet.

But if the internet is continually recycling the entirety of itself through a model that has tens of millions of dollars of funding and research focused on directly improving the quality of it's answer metrics, it's not necessarily 100% locked into a downward quality convergence slide. Especially if we assert that humans /will/ continue to be consistently putting more organic data into the internet over time. It's a pessimistic take.

discuss

No comments yet.