top | item 40219610

(no title)

popinman322 | 1 year ago

Also, similar to Orca-Math but without a teacher model. They also followed an iterative DPO/KTO scheme, but with no length normalized NLL loss term.

discuss

order

algo_trader|1 year ago

If we had a magical (fast) oracle for grading responses, have people done search/expert iteration for LLMs?

Specifically for codegen, i am playing with an iterative interpreter that can quickly (re)evaluate a tree of similar responses