It really doesn't matter how "good" these tools feel, or whatever vague metric you want - they hemorrhage cash at a rate perhaps not seen in human history. In other words, that usage you like is costing them tons of money - the bet is that energy/compute will become vastly cheaper in a matter of a couple of years (extremely unlikely), or they find other ways to monetize that don't absolutely destroy the utility of their product (ads, an area we have seen google flop in spectacularly).And even say the latter strategy works - ads are driven by consumption. If you believe 100% openAI's vision of these tools replacing huge swaths of the workforce reasonably quickly, who will be left to consume? It's all nonsense, and the numbers are nonsense if you spend any real time considering it. The fact SoftBank is a major investor should be a dead giveaway.
df2dd|5 days ago
Have any of you tried re-producing an identical output, given an identical set of inputs? It simply doesn't happen. Its like a lottery.
This lack of reproducibility is a huge problem and limits how far the thing can go.
tvbusy|4 days ago
tibbar|4 days ago
However, we can start by claiming that non-determinism is not necessarily a bad thing - non-greedy token sampling helps prevent certain degenerate/repetitive states and tends to produce overall higher quality responses [0]. I would also observe that part of the yin-yang of working with the agents is letting go of the idea that one is working with a "compiler" and thinking of it more as a promising but fallible collaborator.
With that out of the way, what leads to non-determinism? The classic explanation is the sampling strategy used to select the next token from the LLM. As mentioned above, there are incentives to use a non-zero temperature for this, which means that most LLM APIs are intentionally non-deterministic by default. And, even at temperature zero LLMs are not 100% deterministic [1]. But it's usually pretty close; I am running a local LLM as we speak with greedy sampling and the result is predictably the same each time.
Proprietary reasoning models are another layer of abstraction that may not even offer temperature as knob anymore[2]. I think Claude still offers it, but it doesn't guarantee 100% determinism at temperature 0 either. [3]
Finally, an agentic tool loop may encounter different results from run to run via tool calls -- it's pretty hard to force a truly reproducible environment from run to run.
So, yeah, at best you could get something that is "mostly" deterministic if you coded up your own coding agent that focused on using models that support temperature and always forced it to zero, while carefully ensuring that your environment has not changed from run to run. And this would, unfortunately, probably produce worse output than a non-deterministic model.
[0] https://arxiv.org/abs/2007.14966 [1] https://thinkingmachines.ai/blog/defeating-nondeterminism-in... [2] https://learn.microsoft.com/en-us/azure/ai-foundry/openai/ho... [3] https://platform.claude.com/docs/en/about-claude/glossary
nfg|5 days ago
Evidence? I’m sure someone will argue, but I think it’s generally accepted that inference can be done profitably at this point. The cost for equivalent capability is also plummeting.
JohnMakin|5 days ago
Here you go: https://www.wsj.com/livecoverage/stock-market-today-dow-sp-5...