top | item 47044610

(no title)

Sevii | 12 days ago

Are agents actually capable of answering why they did things? An LLM can review the previous context, add your question about why it did something, and then use next token prediction to generate an answer. But is that answer actually why the agent did what it did?

discuss

gas9S9zw3P9c|12 days ago

It depends. If you have an LLM that uses reasoning the explanation for why decisions are made can often be found in the reasoning token output. So if the agent later has access to that context it could see why a decision was made.

Kubuxu|12 days ago

Reasoning, in majority of cases, is pruned at each conversation turn.

kgeist|12 days ago

LLMs often already "know" the answer starting from the first output token and then emulate "reasoning" so that it appeared as if it came to the conclusion through logic. There's a bunch of papers on this topic. At least it used to be the case a few months ago, not sure about the current SOTA models.

bananapub|12 days ago

of course not, but it can often give a plausible answer, and it's possible that answer will actually happen to be correct - not because it did any - or is capable of any - introspection, but because it's token outputs in response to the question might semi-coincidentally be a token input that changes the future outputs in the same way.

Onavo|12 days ago

Well, the entire field of explainable AI has mostly thrown in the towel..