top | item 47142795

(no title)

It really doesn't matter how "good" these tools feel, or whatever vague metric you want - they hemorrhage cash at a rate perhaps not seen in human history. In other words, that usage you like is costing them tons of money - the bet is that energy/compute will become vastly cheaper in a matter of a couple of years (extremely unlikely), or they find other ways to monetize that don't absolutely destroy the utility of their product (ads, an area we have seen google flop in spectacularly).

And even say the latter strategy works - ads are driven by consumption. If you believe 100% openAI's vision of these tools replacing huge swaths of the workforce reasonably quickly, who will be left to consume? It's all nonsense, and the numbers are nonsense if you spend any real time considering it. The fact SoftBank is a major investor should be a dead giveaway.

discuss

df2dd|5 days ago

Indeed. Many of the posts I see on here are hilarious.

Have any of you tried re-producing an identical output, given an identical set of inputs? It simply doesn't happen. Its like a lottery.

This lack of reproducibility is a huge problem and limits how far the thing can go.

tvbusy|4 days ago

LLMs have randomness baked into every single token it generates. You can try running LLMs locally and set the temperature to low and it immediately feels boring to always have the same reply every time. It's the randomness that makes them feel "smart". Put it another way, randomness is required for the illusion of intelligence.

tibbar|4 days ago

Determinism in agents is a complex topic because there are several different layers of abstraction, each of which may introduce its own non-determinism. But yeah, it is going to be difficult to induce determinism in a commercial coding agent, for reasons discussed below.

However, we can start by claiming that non-determinism is not necessarily a bad thing - non-greedy token sampling helps prevent certain degenerate/repetitive states and tends to produce overall higher quality responses [0]. I would also observe that part of the yin-yang of working with the agents is letting go of the idea that one is working with a "compiler" and thinking of it more as a promising but fallible collaborator.

With that out of the way, what leads to non-determinism? The classic explanation is the sampling strategy used to select the next token from the LLM. As mentioned above, there are incentives to use a non-zero temperature for this, which means that most LLM APIs are intentionally non-deterministic by default. And, even at temperature zero LLMs are not 100% deterministic [1]. But it's usually pretty close; I am running a local LLM as we speak with greedy sampling and the result is predictably the same each time.

Proprietary reasoning models are another layer of abstraction that may not even offer temperature as knob anymore[2]. I think Claude still offers it, but it doesn't guarantee 100% determinism at temperature 0 either. [3]

Finally, an agentic tool loop may encounter different results from run to run via tool calls -- it's pretty hard to force a truly reproducible environment from run to run.

So, yeah, at best you could get something that is "mostly" deterministic if you coded up your own coding agent that focused on using models that support temperature and always forced it to zero, while carefully ensuring that your environment has not changed from run to run. And this would, unfortunately, probably produce worse output than a non-deterministic model.

[0] https://arxiv.org/abs/2007.14966 [1] https://thinkingmachines.ai/blog/defeating-nondeterminism-in... [2] https://learn.microsoft.com/en-us/azure/ai-foundry/openai/ho... [3] https://platform.claude.com/docs/en/about-claude/glossary

nfg|5 days ago

> In other words, that usage you like is costing them tons of money

Evidence? I’m sure someone will argue, but I think it’s generally accepted that inference can be done profitably at this point. The cost for equivalent capability is also plummeting.

JohnMakin|5 days ago

I didn't think there would need to be more evidence than the fact they are saying they need to spend $600 billion in 4 years on $13bn revenue currently, but here we are.

Here you go: https://www.wsj.com/livecoverage/stock-market-today-dow-sp-5...