(no title)
imagainstit | 2 years ago
This leads to different results from accumulating sums in different orderings. Accumulating in different ordering is common in parallel math operations.
imagainstit | 2 years ago
This leads to different results from accumulating sums in different orderings. Accumulating in different ordering is common in parallel math operations.
scarmig|2 years ago
imagainstit|2 years ago
This is sort of a deep topic, so it's hard to give a concise answer but as an example: CuBLAS guarantees determinism, but only for the same arch and same library version (because the best performing ordering of operations depends on arch and implementation details) and does not guarantee it when using multiple streams (because the thread scheduling is non-deterministic and can change ordering).
Determinism is something you have to build in from the ground up if you want it. It can cost performance, it won't give you the same results between different architectures, and it's frequently tricky to maintain in the face of common parallel programming patterns.
Consider this explanation from the pytorch docs (particularly the bit on cuda convolutions):
https://pytorch.org/docs/stable/notes/randomness.html
SomewhatLikely|2 years ago
ossopite|2 years ago
edit: at 13:42 in https://www.youtube.com/watch?v=TB07_mUMt0U&t=13m42s there is an explanation of the phenomenon in the context of training but I suspect the same kind of operation is happening during inference
charcircuit|2 years ago