top | item 45435217 (no title) selim-now | 5 months ago That would definitely make the evaluation more robust. My fear is that with LLMs at hand people became allergic to preparing good human-labelled evaluation sets and would always to some degree use an LLM as a crutch. discuss order hn newest No comments yet.
No comments yet.