top | item 45435217

(no title)

selim-now | 5 months ago

That would definitely make the evaluation more robust. My fear is that with LLMs at hand people became allergic to preparing good human-labelled evaluation sets and would always to some degree use an LLM as a crutch.

discuss

order

No comments yet.