Determinism isn't always ideal. Determinism may trade off with things like accuracy, performance, etc. There are situations where the tradeoff is well worth it.
Yep, there are plenty of things that aren't computable without burning all the entropy in the visible universe, yet if you exchange it with a heuristic you can get a good enough answer in polynomial time.
Also, at temperature 0 LLMs can behave deterministically! Indeterminism isn't necessarily quite the right word for the kind of abstraction LLMs provide
Even at temperature != 0 it's trivial to just use a fixed seed in the RNG... it's just a computer being used in a naive, not even multi threaded (i.e. with race conditions), way.
I wouldn't be surprised to find out different stacks multiple fp16s slightly differently or something. Getting determinism across machines might take some work... but there's really nothing magic going on here.
Quite pleased you mentioned this. I would like to add transformer LLMs can be turing complete, see the work of Franz Nowak and his colleagues (I think there were at least one or two other papers by other teams but I read Nowak's the closest as it was the latest one when I became aware of this).
Nobody was stopping anyone from making compilers that introduced random different behavior every time you ran them. I think it's telling this didn't catch on.
There were definitely compilers that used things like data-structures with an unstable iteration order resulting in non-determinism, and people went stopping other people from doing that. This behavior would result in non-deterministic performance everywhere, and combined with race conditions or just undefined behavior other random non-deterministic behaviors too.
At least in part this was achieved with techniques that can be used to make LLMs to, like by seeding RNGs in hash tables deterministically. LLMs are in that sense no less deterministic than iterating over a hash table (they are just a bunch of matrix multiplications with a sampling procedure at the end, after all).
pixl97|8 months ago
Weather forecasts are a good example of this.
betenoire|8 months ago
josefx|8 months ago
aradox66|8 months ago
gpm|8 months ago
I wouldn't be surprised to find out different stacks multiple fp16s slightly differently or something. Getting determinism across machines might take some work... but there's really nothing magic going on here.
bird0861|8 months ago
josefx|8 months ago
billyp-rva|8 months ago
gpm|8 months ago
There were definitely compilers that used things like data-structures with an unstable iteration order resulting in non-determinism, and people went stopping other people from doing that. This behavior would result in non-deterministic performance everywhere, and combined with race conditions or just undefined behavior other random non-deterministic behaviors too.
At least in part this was achieved with techniques that can be used to make LLMs to, like by seeding RNGs in hash tables deterministically. LLMs are in that sense no less deterministic than iterating over a hash table (they are just a bunch of matrix multiplications with a sampling procedure at the end, after all).