top | item 47205140

(no title)

geokon | 14 hours ago

> What’s the deal with “hallucinations”? The model generates tokens by sampling from a probability distribution. It has no concept of truth, it only knows what sequences are statistically plausible given the training data.

Extremely naiive question.. but could LLM output be tagged with some kind of confidence score? Like if I'm asking an LLM some question does it have an internal metric for how confident it is in its output? LLM outputs seem inherently rarely of the form "I'm not really sure, but maybe this XXX" - but I always felt this is baked in the model somehow

discuss

order

andy12_|14 hours ago

The model could report the confidence of its output distribution, but it isn't necessarily calibrated (that is, even if it tells you that it's 70% confident, it doesn't mean that it is right 70% of the time). Famously, pre-trained base models are calibrated, but they stop being calibrated when they are post-trained to be instruction-following chatbots [1].

Edit: There is also some other work that points out that chat models might not be calibrated at the token-level, but might be calibrated at the concept-level [2]. Which means that if you sample many answers, and group them by semantic similarity, that is also calibrated. The problem is that generating many answer and grouping them is more costly.

[1] https://arxiv.org/pdf/2303.08774 Figure 8

[2] https://arxiv.org/pdf/2511.04869 Figure 1.

geokon|14 hours ago

In absolute terms sure, but the token stream's confidence changes as it's coming out right? Consumer LLMs typically have a lot window dressing. My sense is this encourages the model to stay on-topic and it's mostly "high confidence" fluff. As it's spewing text/tokens back at you maybe when it starts hallucinating you'd expect a sudden dip in the confidence?

You could color code the output token so you can see some abrupt changes

It seems kind of obvious, so I'm guessing people have tried this

chongli|2 hours ago

Having a confidence score isn't as useful as it seems unless you (the user) know a lot about the contents of the training set.

Think of traditional statistics. Suppose I said "80% of those sampled preferred apples to oranges, and my 95% confidence interval is within +/- 2% of that" but then I didn't tell you anything about how I collected the sample. Maybe I was talking to people at an apple pie festival? Who knows! Without more information on the sampling method, it's hard to make any kind of useful claim about a population.

This is why I remain so pessimistic about LLMs as a source of knowledge. Imagine you had a person who was raised from birth in a completely isolated lab environment and taught only how to read books, including the dictionary. They would know how all the words in those books relate to each other but know nothing of how that relates to the world. They could read the line "the killer drew his gun and aimed it at the victim" but what would they really know of it if they'd never seen a gun?

radarsat1|2 hours ago

I think your last point raises the following question: how would you change your answer if you know they read all about guns and death and how one causes the other? What if they'd seen pictures of guns? And pictures of victims of guns annotated as such? What if they'd seen videos of people being shot by guns?

I mean I sort of understand what you're trying to say but in fact a great deal of knowledge we get about the world we live in, we get second hand.

There are plenty of people who've never held a gun, or had a gun aimed at them, and.. granted, you could argue they probably wouldn't read that line the same way as people who have, but that doesn't mean that the average Joe who's never been around a gun can't enjoy media that features guns.

Same thing about lots of things. For instance it's not hard for me to think of animals I've never seen with my own eyes. A koala for instance. But I've seen pictures. I assume they exist. I can tell you something about their diet. Does that mean I'm no better than an LLM when it comes to koala knowledge? Probably!

danlitt|2 hours ago

Can it generate one? Sure. But it won't mean anything, since you don't know (and nobody knows) the "true" distribution.

DavidSJ|14 hours ago

Yes, the actual LLM returns a probability distribution, which gets sampled to produce output tokens.

[Edit: but to be clear, for a pretrained model this probability means "what's my estimate of the conditional probability of this token occurring in the pretraining dataset?", not "how likely is this statement to be true?" And for a post-trained model, the probability really has no simple interpretation other than "this is the probability that I will output this token in this situation".]

mr_toad|11 hours ago

It’s often very difficult (intractable) to come up with a probability distribution of an estimator, even when the probability distribution of the data is known.

Basically, you’d need a lot more computing power to come up with a distribution of the output of an LLM than to come up with a single answer.

podnami|14 hours ago

What happens before the probability distribution? I’m assuming say alignment or other factors would influence it?

jorvi|5 hours ago

> I'm not really sure, but maybe this XXX

You never see this in the response but you do in the reasoning.

podnami|14 hours ago

I would assume this is from case to case, such as:

- How aligned has it been to “know” that something is true (eg ethical constraints)

- Statistical significance and just being able to corroborate one alternative in Its training data more strongly than another

- If it’s a web search related query, is the statement from original sources vs synthesised from say third party sources

But I’m just a layman and could be totally off here.

Lionga|14 hours ago

The LLM has an internal "confidence score" but that has NOTHING to do with how correct the answer is, only with how often the same words came together in training data.

E.g. getting two r's in strawberry could very well have a very high "confidence score" while a random but rare correct fact might have a very well a very low one.

In short: LLM have no concept, or even desire to produce of truth

sharperguy|13 hours ago

Still, it might be interesting information to have access to, as someone running the model? Normally we are reading the output trying to build an intuition for the kinds of patterns it outputs when it's hallucinating vs creating something that happens to align with reality. Adding in this could just help with that even when it isn't always correlated to reality itself.

alexwebb2|14 hours ago

Huge leap there in your conclusion. Looks like you’re hand-waving away the entire phenomenon of emergent properties.

amelius|12 hours ago

> In short: LLM have no concept, or even desire to produce of truth

They do produce true statements most of the time, though.