top | item 44242089

(no title)

RandomBK | 8 months ago

My 2c is that it is worthwhile to train on AI generated content that has obtained some level of human approval or interest, as a form of extended RLHF loop.

discuss

cryptonector|8 months ago

Ok, but how do you denote that approval? What if you partially approve of that content? ("Overall this is correct, but this little nugget is hallucinated.")

bongodongobob|8 months ago

It apparently doesn't matter unless you somehow consider the entire Internet to be correct. They didn't only feed LLMs correct info. It all just got shoveled in and here we are.