top | item 46146294

(no title)

undefeated | 2 months ago

I think part of the problem is that you need a model to classify the data, which needs to be trained on data that wasn't classified (or a dramatically smaller set of human-classified data), so it's effectively impossible to escape this sort of input bias.

Tangentially, I'd be far from the first to point out that these LLMs are now polluting their own training data, which makes filtering simulatenously all the more important and impossible.

discuss

No comments yet.