By virtue of increasing randomness, we got the correct answer once ... a monkey at a typewriter will also spit out the correct answer occasionally. Temperature 0 is the correct evaluation.
Just in real life usage, it is extremely uncommon to stochastically query the model and use the most common answer. Using it with temperature 0 is the "best" answer as it uses the most likely tokens in each completion.
In theory maybe, but I don't think it is in practice. It feels like each model has its own quasi-optimal temperature and other settings at which it performs vastly better. Sort of like a particle filter that must do random sampling to find the optimal solution.
scrollop|2 years ago
throwaway63820|2 years ago
Just in real life usage, it is extremely uncommon to stochastically query the model and use the most common answer. Using it with temperature 0 is the "best" answer as it uses the most likely tokens in each completion.
moffkalast|2 years ago
In theory maybe, but I don't think it is in practice. It feels like each model has its own quasi-optimal temperature and other settings at which it performs vastly better. Sort of like a particle filter that must do random sampling to find the optimal solution.
scrollop|2 years ago
https://www.youtube.com/watch?v=ReO2CWBpUYk