top | item 44432608

(no title)

phyalow | 8 months ago

What? You can get consistent output on local models.

I can train large nets deterministically too (CUBLAS flags). What your saying isn't true in practice. Hell I can also go on the anthropic API right now and get verbatim static results.

discuss

simonw|8 months ago

"Hell I can also go on the anthropic API right now and get verbatim static results."

How?

Setting temperature to 0 won't guarantee the exact same output for the exact same input, because - as the previous commenter said - floating point arithmetic is non-commutative, which becomes important when you are running parallel operations on GPUs.

sva_|8 months ago

Shouldn't it be the fact that they're non-associative? Because the reduction kernels will combine partial results (like the dot‑products in a GEMM or the sum across attention heads) in a way that the order of operations may change (non-associative), which can lead to the individual floats to be round off differently.

oxidi|8 months ago

I think lots of people misunderstand that the "non-deterministic" nature of LLMs come from sampling the token distribution, not from the model itself.

simonw|8 months ago

It's also the way the model runs. Setting temperature to zero and picking a fixed seed would ideally result in deterministic output from the sampler, but in parallel execution of matrix arithmetic (eg using a GPU) the order of floating point operations starts to matter, so timing differences can produce different results.