top | item 46542735

(no title)

That's a valid thought. AS AI generates a lot of content, some of which may be hallucinations, the new cycle of training will be probably using the old + the_new_AI_slop data, and as a result degrade the final result.

Unless the AIs find out where mistakes occur, and find this out in the code they themselves generate, your conclusion seems logically valid.

discuss

sosodev|1 month ago

Hallucinations generally don't matter at scale. Unless you're feeding back 100% synthetic data into your training loop it's just noise like everything else.

Is the average human 100% correct with everything they write on the internet? Of course not. The absurd value of LLMs is that they can somehow manage to extract the signal from that noise.

imiric|1 month ago

> The absurd value of LLMs is that they can somehow manage to extract the signal from that noise.

Say what? LLMs absolutely cannot do that.

They rely on armies of humans to tirelessly filter, clean, and label data that is used for training. The entire "AI" industry relies on companies and outsourced sweatshops to do this work. It is humans that extract the signal from the noise. The machine simply outputs the most probable chain of tokens.

So hallucinations definitely matter, especially at scale. It makes the job of humans much, much harder, which in turn will inevitably produce lower quality models. Garbage in, garbage out.

phyzome|1 month ago

It's only "noise" if it's uncorrelated. I don't see any reason to believe it wouldn't be correlated, though.

intended|1 month ago

LLM content generation is divorced from human limitations and human scale.

Using human foibles when discussing LLM scale issues is apples and oranges.