top | item 44830647

(no title)

localhost | 6 months ago

even with t=0 they are stochastic. e.g., non associative nature of floating point operations

discuss

order

int_19h|6 months ago

That is an artifact of implementation. You can absolutely implement it using strict FP. But even if not, any given implementation will still do things in a specific order which can be documented. And then if you're running quantized (including KV cache), there's a lot less floating point involved.