top | item 40237004

(no title)

1 points| remilouf | 1 year ago

discuss

LLM evaluations are very sensitive to the details of the prompt's structure. This post shows how using structured generation reduces the results' variance and the ranking shifts.