top | item 47007517

Generalized on-policy distillation with reward extrapolation

3 points| fzliu | 17 days ago |arxiv.org

discuss

order

No comments yet.