top | item 35694776

(no title)

amrb | 2 years ago

I'd like to see a yearly benchmark for models, could be logic puzzles or a suit of tasks but as it stands there is not good way to measure the ability of models.

discuss

order

No comments yet.