(no title)
fnl | 1 year ago
And in general, binary floating point arithmetic cannot guarantee associativity - i.e. `(a + b) + c` might not be the same as `a + (b + c)`. That in turn can lead to the model picking another token in rare cases (and it’s auto-regressive consequences, that the entire remainder of the generated sequence might differ): https://www.ingonyama.com/blog/solving-reproducibility-chall...
Edit: Of course, my answer assumes you are asking about the case when the model lets you set its token generation temperature (stochasticity) to exactly zero. With default parameter settings, all LLMs I know of randomly pick among the best tokens.
No comments yet.