top | item 41372149 (no title) cjbillington | 1 year ago What do they do instead? Given we're not talking to a base model. discuss order hn newest tqi|1 year ago Supposedly they use "RLAIF", but honestly given that the first step is to "generate responses... using a helpful-only AI assistant" it kinda sounds like RLHF with more steps.https://www.anthropic.com/research/constitutional-ai-harmles...
tqi|1 year ago Supposedly they use "RLAIF", but honestly given that the first step is to "generate responses... using a helpful-only AI assistant" it kinda sounds like RLHF with more steps.https://www.anthropic.com/research/constitutional-ai-harmles...
tqi|1 year ago
https://www.anthropic.com/research/constitutional-ai-harmles...