(no title)
nemo1618 | 1 day ago
This is emphatically not fundamental to LLMs! Yes, the next token is selected randomly; but "randomly" could mean "chosen using an RNG with a fixed seed." Indeed, many APIs used to support a "temperature" parameter that, when set to 0, would result in fully deterministic output. These parameters were slowly removed or made non-functional, though, and the reason has never been entirely clear to me. My current guess is that it is some combination of A) 99% of users don't care, B) perfect determinism would require not just a seeded RNG, but also fixing a bunch of data races that are currently benign, and C) deterministic output might be exploitable in undesirable ways, or lead to bad PR somehow.
pavpanchekha|1 day ago
yorwba|21 hours ago
Batching leads to cross-contamination in practice because of things like MoE load-balancing within the batch, or supporting different batch sizes with different kernels that have different numerical behavior. But a careful implementation could avoid such issues while still benefiting from the higher efficiency of batching.
unknown|1 day ago
[deleted]
valenterry|1 day ago
This. Thanks for saying that, because now I don't need to read the article, since if the author doesn't even get that, I'm not interested in the rest.
jrmg|1 day ago
willj|1 day ago