(no title)
df2dd | 5 days ago
Have any of you tried re-producing an identical output, given an identical set of inputs? It simply doesn't happen. Its like a lottery.
This lack of reproducibility is a huge problem and limits how far the thing can go.
df2dd | 5 days ago
Have any of you tried re-producing an identical output, given an identical set of inputs? It simply doesn't happen. Its like a lottery.
This lack of reproducibility is a huge problem and limits how far the thing can go.
tvbusy|4 days ago
df2dd|4 days ago
tibbar|4 days ago
However, we can start by claiming that non-determinism is not necessarily a bad thing - non-greedy token sampling helps prevent certain degenerate/repetitive states and tends to produce overall higher quality responses [0]. I would also observe that part of the yin-yang of working with the agents is letting go of the idea that one is working with a "compiler" and thinking of it more as a promising but fallible collaborator.
With that out of the way, what leads to non-determinism? The classic explanation is the sampling strategy used to select the next token from the LLM. As mentioned above, there are incentives to use a non-zero temperature for this, which means that most LLM APIs are intentionally non-deterministic by default. And, even at temperature zero LLMs are not 100% deterministic [1]. But it's usually pretty close; I am running a local LLM as we speak with greedy sampling and the result is predictably the same each time.
Proprietary reasoning models are another layer of abstraction that may not even offer temperature as knob anymore[2]. I think Claude still offers it, but it doesn't guarantee 100% determinism at temperature 0 either. [3]
Finally, an agentic tool loop may encounter different results from run to run via tool calls -- it's pretty hard to force a truly reproducible environment from run to run.
So, yeah, at best you could get something that is "mostly" deterministic if you coded up your own coding agent that focused on using models that support temperature and always forced it to zero, while carefully ensuring that your environment has not changed from run to run. And this would, unfortunately, probably produce worse output than a non-deterministic model.
[0] https://arxiv.org/abs/2007.14966 [1] https://thinkingmachines.ai/blog/defeating-nondeterminism-in... [2] https://learn.microsoft.com/en-us/azure/ai-foundry/openai/ho... [3] https://platform.claude.com/docs/en/about-claude/glossary
df2dd|4 days ago
This world of extremes is annoying for people who have the ability to think more broadly and see a world where deterministic systems and non-deterministic systems can work together, where it makes sense.