(no title)
olliepro | 2 months ago
An interesting problem since the creators of OLMO have mentioned that throughout training, they use 1/3 or their compute just doing evaluations.
Edit:
One nice thing about the “critic” approach is that the restaurant (or model provider) doesn’t have access to the benchmark to quasi-directly optimize against.
No comments yet.