top | item 40686216

(no title)

Hey! Essay author here.

>The cool thing about using modern LLMs as an eval/policy model is that their RLHF propagates throughout the search.

>Moreover, if search techniques work on the token level (likely), their thoughts are perfectly interpretable.

I suspect a search world is substantially more alignment-friendly than a large model world. Let me know your thoughts!

discuss

Tepix|1 year ago

Your webpage is broken for me. The page appears briefly, then there's a french error message telling me that an error occured and i can retry.

Mobile Safari, phone set to french.

abid786|1 year ago

I'm in the same situation (mobile Safari, French phone) but if you use Chrome it works