(no title)
aidan_mclau | 1 year ago
>The cool thing about using modern LLMs as an eval/policy model is that their RLHF propagates throughout the search.
>Moreover, if search techniques work on the token level (likely), their thoughts are perfectly interpretable.
I suspect a search world is substantially more alignment-friendly than a large model world. Let me know your thoughts!
Tepix|1 year ago
Mobile Safari, phone set to french.
abid786|1 year ago