top | item 44585997

(no title)

npip99 | 7 months ago

Yes our pairwise method is based entirely on 2AFC comparisons, for both intra-query and inter-query ELO calculations.

It's definitely the best if not only way to get extremely high signal, and a score assignment that actually converges the more you sample.

In terms of the "F" in 2AFC, we actually have this amusing snippet from our prompt:

> Do NOT output a score of 0.0, ensure to focus on which document is superior, and provide a negative or positive float between -1.0 and 1.0.

discuss

order

reactordev|7 months ago

Nice, I use an epoch to prevent stalemate but this might be better.