top | item 40219610 (no title) popinman322 | 1 year ago Also, similar to Orca-Math but without a teacher model. They also followed an iterative DPO/KTO scheme, but with no length normalized NLL loss term. discuss order hn newest algo_trader|1 year ago If we had a magical (fast) oracle for grading responses, have people done search/expert iteration for LLMs?Specifically for codegen, i am playing with an iterative interpreter that can quickly (re)evaluate a tree of similar responses
algo_trader|1 year ago If we had a magical (fast) oracle for grading responses, have people done search/expert iteration for LLMs?Specifically for codegen, i am playing with an iterative interpreter that can quickly (re)evaluate a tree of similar responses
algo_trader|1 year ago
Specifically for codegen, i am playing with an iterative interpreter that can quickly (re)evaluate a tree of similar responses