(no title)
kakugawa | 3 months ago
https://news.ycombinator.com/item?id=45200925
https://thinkingmachines.ai/blog/defeating-nondeterminism-in...
> As it turns out, our request’s output does depend on the parallel user requests. Not because we’re somehow leaking information across batches — instead, it’s because our forward pass lacks “batch invariance”, causing our request’s output to depend on the batch size of our forward pass.
tl;dr: the way inference is batched introduces non-determinism.
No comments yet.