(no title)
avhception | 12 days ago
I then asked if there is anything I could do to prevent misinterpretations from producing wild results like this. So I got the advice to put an instruction in AGENTS.md that would urge agents to ask for clarification before proceeding. But I didn't add it. Out of the 25 lines of my AGENTS.md, many are already variations of that. The first three:
- Do not try to fill gaps in your knowledge with overzealous assumptions.
- When in doubt: Slow down, double-check context, and only touch what was explicitly asked for.
- If a task seems to require extra changes, pause and ask before proceeding.
If these are not enough to prevent stuff like that, I don't know what could.
Sevii|12 days ago
gas9S9zw3P9c|12 days ago
bananapub|12 days ago
Onavo|12 days ago
bandrami|12 days ago
tibbar|12 days ago
tomashubelbauer|12 days ago
I have a line there that says Codex should never use Node APIs where Bun APIs exist for the same thing. Routinely, Claude Code and now Codex would ignore this.
I just replaced that rule with a TypeScript-compiler-powered AST based deterministic rule. Now the agent can attempt to commit code with banned Node API usage and the pre-commit script will fail, so it is forced to get it right.
I've found myself migrating more and more of my AGENTS.md instructions to compiler-based checks like these - where possible. I feel as though this shouldn't be needed if the models were good, but it seems to be and I guess the deterministic nature of these checks is better than relying on the LLM's questionable respect of the rules.
iamflimflam1|12 days ago
We have pre-commit hooks to prevent people doing the wrong thing. We have all sorts of guardrails to help people.
And the “modern” approach when someone does something wrong is not to blame the person, but to ask “how did the system allow this mistake? What guardrails are missing?”
MITSardine|12 days ago
unknown|12 days ago
[deleted]
geraneum|12 days ago
You may want to ask the next LLM versions the same question after they feed this paper through training.
sensanaty|12 days ago
Even the "thinking" blocks in newer models are an illusion. There is no functional difference between the text in a thought block and the final answer. To the model, they are just more tokens in a linear sequence. It isn't "thinking" before it speaks, the "thought" is the speech.
Treating those thoughts as internal reflection of some kind is a category error. There is no "privileged" layer of reasoning happening in the silicon that then gets translated into the thought block. It’s a specialized output where the model is forced to show its work because that process of feeding its own generated strings back into its context window statistically increases the probability of a correct result. The chatbot providers just package this in a neat little window to make the model's "thinking" part of the gimmick.
I also wouldn't be surprised if asking it stuff like this was actually counter productive, but for this I'm going off vibes. The logic being that by asking that, you're poisoning the context, similar to how if you try generate an image by saying "It should not have a crocodile in the image", it will put a crocodile into the image. By asking it why it did something wrong, it'll treat that as the ground truth and all future generation will have that snippet in it, nudging the output in such a way that the wrong thing itself will influence it to keep doing the wrong thing more and more.
Bolwin|12 days ago
That said it can still be useful because you have a some weird behavior and 199k tokens of context, with no idea where the info is that's nudging it to do the weird thing.
In this case you can think of it less as "why did you do this?" And more "what references to doing this exist in this pile of files and instructions?"
bavell|12 days ago
Majromax|12 days ago
"Thinking meat! You're asking me to believe in thinking meat!"
While next-token prediction based on matrix math is certainly a literal, mechanistic truth, it is not a useful framing in the same sense that "synapses fire causing people to do things" is not a useful framing for human behaviour.
The "theory of mind" for LLMs sounds a bit silly, but taken in moderation it's also a genuine scientific framework in the sense of the scientific method. It allows one to form hypothesis, run experiments that can potentially disprove the hypothesis, and ultimately make skillful counterfactual predictions.
> By asking it why it did something wrong, it'll treat that as the ground truth and all future generation will have that snippet in it, nudging the output in such a way that the wrong thing itself will influence it to keep doing the wrong thing more and more.
In my limited experience, this is not the right use of introspection. Instead, the idea is to interrogate the model's chain of reasoning to understand the origins of a mistake (the 'theory of mind'), then adjust agents.md / documentation so that the mistake is avoided for future sessions, which start from an otherwise blank slate.
I do agree, however, that the 'theory of mind' is very close to the more blatantly incorrect kind of misapprehension about LLMs, that since they sound humanlike they have long-term memory like humans. This is why LLM apologies are a useless sycophancy trap.
seanmcdirmid|12 days ago
Asking it why it did something isn’t useless, it just isn’t fullproof. If you really think it’s useless, you are way too heavily into binary thinking to be using AI.
Perfect is the enemy of useful in this case.
lebuin|12 days ago
hnbad|12 days ago
I once had an agent come up with what seemed like a pointlessly convoluted solution as it tried to fit its initial approach (likely sourced from framework documentation overemphasizing the importance of doing it "the <framework> way" when possible) to a problem for which it to me didn't really seem like a good fit. It kept reassuring me that this was the way to go and my concerns were invalid.
When I described the solution and the original problem to another agent running the same model, it would instantly dismiss it and point out the same concerns I had raised - and it would insist on those being deal breakers the same way the other agent had dimissed them as invalid.
In the past I've often found LLMs to be extremely opinionated while also flipping their positions on a dime once met with any doubt or resistance. It feels like I'm now seeing the opposite: the LLM just running with whatever it picked up first from the initial prompt and then being extremely stubborn and insisting on rationalizing its choice no matter how much time it wastes trying to make it work. It's sometimes better to start a conversation over than to try and steer it in the right direction at that point.
avhception|12 days ago
mustaphah|12 days ago
delaminator|12 days ago
"You're absolutely correct. I should have checked my skills before doing that. I'll make sure I do it in the future."