top | item 35807341 RLHF: Reinforcement Learning from Human Feedback 4 points| madisonmay | 2 years ago |huyenchip.com 1 comment order hn newest heliophobicdude|2 years ago This is a very well written article. Not in the article, but can we still call models like Alpaca RLHF though? What do we call these models finetune on demonstrations created by other chat bots?
heliophobicdude|2 years ago This is a very well written article. Not in the article, but can we still call models like Alpaca RLHF though? What do we call these models finetune on demonstrations created by other chat bots?
heliophobicdude|2 years ago