This isn't really true unfortunately -- mixture of experts routing seems to suffer from batch non-determinism. No one has stated publicly exactly why this is, but you can easily replicate the behavior yourself or find bug reports / discussion with a bit of searching. The outcome and observed behavior of the major closed-weight LLM APIs is that a temperature of zero no longer corresponds to deterministic greedy sampling.
brookst|1 year ago
wodenokoto|1 year ago
Temperature changes the softmax equation[1], not weather or not you are sampling from the softmax result or choosing the highest probability. IBM's documentation corroborates this, saying you need to set do_sample to True in order for the temperature to have any effect, e.g., T changes how we sample, not if we sample [2].
A similar discussion on openai forum also claim that the RNG might be in a different state from run to run, although I am less sure about that [3]
[1] https://pelinbalci.com/2023/10/16/Temperature_parameter.html
[2] https://www.ibm.com/think/topics/llm-temperature#:~:text=The...
[3] https://community.openai.com/t/clarifications-on-setting-tem...
TeMPOraL|1 year ago
michalsustr|1 year ago
petesergeant|1 year ago
daralthus|1 year ago