top | item 45911542

(no title)

jjcob | 3 months ago

Might be a result of using LLMs to evaluate the output of other LLMs.

LLMs probably get higher scores if they explicitly state that they are following instructions...

discuss

order

resfirestar|3 months ago

It's like writing an essay for a standardized test, as opposed to one for a college course or for a general audience. When taking a test, you only care about the evaluation of a single grader hurrying to get through a pile of essays, so you should usually attempt to structure your essay to match the format of the scoring rubric. Doing this on an essay for a general audience would make it boring, and doing it in your college course might annoy your professor. Hopefully instruction-following evaluations don't look too much like test grading, but this kind of behavior would make some sense if they do.