top | item 44585905

(no title)

npip99 | 7 months ago

Hey! We actually did a lot of research into ELO consistency, i.e. to check whether or not the NxN pairwise matrix followed the ELO model. It was a long road that's probably grounds for an entirely separate blog post, but the TLDR is that we observe that:

For each document, there is a secret hidden score "s" which is the "fundamental relevance according to the LLM". Then, when we sample (q, d1, d2) from the LLM, the LLM follows the statistical property that:

- The "fundamental hidden preference" is `pref = s_{d1} - s_{d2}`, usually ranging between -4 and 4.

- The LLM will sample a normal distribution around the `pref` with stddev ~0.2, which is some "inner noise" that the LLM experiences before coming to a judgement.

- The preference will pass through the sigmoid to get a sampled_score \in [0, 1].

- There is an additional 2% noise. i.e., 0.98 * sampled_score + 0.02 * random.random()

When we use Maximum Likelihood Estimation to find the most likely predicted "hidden scores" \hat{s} associated with each document, then we go ahead and sample pairwise matrices according to `0.98 * sigmoid( \hat{s}_1 - \hat{s}_2 + N(0, 0.02) ) + Uniform(0.02)`, then we get a pairwise matrix with virtually identical statistical properties to the observed pairwise matrices.

discuss

order

slybot|7 months ago

More confused,

1) 0.02 * random.random() != N(0, 0.02)

2) The LLM will sample a normal distribution, this only depends on your c parameter, the absolute scale doesn't matter neither in Bradley-Terry nor in Elo. So saying +-4 and claiming LLM reasoning in Standard normal is ridiculous.

3) > then we get a pairwise matrix with virtually identical statistical properties to the observed pairwise matrices. >>> then did you asked yourselves if I have "statistically identical" pair-wise matrix and observed pairwise matrix, the. why you even bother myself? You can simply use observed pairwise matrix...