top | item 44432423

(no title)

kazga | 8 months ago

That's not true in practice. Floating point arithmetic is not commutative due to rounding errors, and the parallel operations introduce non-determinisn even at temperature 0.

discuss

SetTheorist|8 months ago

Nitpick: I think you mean that FP arithmetic is not _associative_ rather than non-commutative.

Commutative: A+B = B+A Associative: A+(B+C) = (A+B)+C

zorked|8 months ago

That's basically a bug though, not an important characteristic of the system. Engineering tradeoff, not math.

e12e|8 months ago

It's pretty important when discussing concrete implementations though, just like when using floats as coordinates in a space/astronomy simulator and getting decreasing accuracy as your objects move away from your chosen origin.

phyalow|8 months ago

What? You can get consistent output on local models.

I can train large nets deterministically too (CUBLAS flags). What your saying isn't true in practice. Hell I can also go on the anthropic API right now and get verbatim static results.

simonw|8 months ago

"Hell I can also go on the anthropic API right now and get verbatim static results."

How?

Setting temperature to 0 won't guarantee the exact same output for the exact same input, because - as the previous commenter said - floating point arithmetic is non-commutative, which becomes important when you are running parallel operations on GPUs.

oxidi|8 months ago

I think lots of people misunderstand that the "non-deterministic" nature of LLMs come from sampling the token distribution, not from the model itself.