top | item 47146003

(no title)

df2dd | 5 days ago

Indeed. Many of the posts I see on here are hilarious.

Have any of you tried re-producing an identical output, given an identical set of inputs? It simply doesn't happen. Its like a lottery.

This lack of reproducibility is a huge problem and limits how far the thing can go.

discuss

tvbusy|4 days ago

LLMs have randomness baked into every single token it generates. You can try running LLMs locally and set the temperature to low and it immediately feels boring to always have the same reply every time. It's the randomness that makes them feel "smart". Put it another way, randomness is required for the illusion of intelligence.

df2dd|4 days ago

Im fully aware of that. However, this illusion is a dangerous mirage. It doesnt equate to reality. In some cases thats OK. But in most cases its not, especially so in the context of business operations.

tibbar|4 days ago

Determinism in agents is a complex topic because there are several different layers of abstraction, each of which may introduce its own non-determinism. But yeah, it is going to be difficult to induce determinism in a commercial coding agent, for reasons discussed below.

However, we can start by claiming that non-determinism is not necessarily a bad thing - non-greedy token sampling helps prevent certain degenerate/repetitive states and tends to produce overall higher quality responses [0]. I would also observe that part of the yin-yang of working with the agents is letting go of the idea that one is working with a "compiler" and thinking of it more as a promising but fallible collaborator.

With that out of the way, what leads to non-determinism? The classic explanation is the sampling strategy used to select the next token from the LLM. As mentioned above, there are incentives to use a non-zero temperature for this, which means that most LLM APIs are intentionally non-deterministic by default. And, even at temperature zero LLMs are not 100% deterministic [1]. But it's usually pretty close; I am running a local LLM as we speak with greedy sampling and the result is predictably the same each time.

Proprietary reasoning models are another layer of abstraction that may not even offer temperature as knob anymore[2]. I think Claude still offers it, but it doesn't guarantee 100% determinism at temperature 0 either. [3]

Finally, an agentic tool loop may encounter different results from run to run via tool calls -- it's pretty hard to force a truly reproducible environment from run to run.

So, yeah, at best you could get something that is "mostly" deterministic if you coded up your own coding agent that focused on using models that support temperature and always forced it to zero, while carefully ensuring that your environment has not changed from run to run. And this would, unfortunately, probably produce worse output than a non-deterministic model.

[0] https://arxiv.org/abs/2007.14966 [1] https://thinkingmachines.ai/blog/defeating-nondeterminism-in... [2] https://learn.microsoft.com/en-us/azure/ai-foundry/openai/ho... [3] https://platform.claude.com/docs/en/about-claude/glossary

df2dd|4 days ago

Appreciate the response. I agree that non-determinism isnt a bad thing. However LLMs are being pushed as the thing to replace much of the deterministic things that exist in the world - and anyone seen to be thinking otherwise gets punished e.g. in the stock market.

This world of extremes is annoying for people who have the ability to think more broadly and see a world where deterministic systems and non-deterministic systems can work together, where it makes sense.