Agreed. All it takes is a simple reply of “you’re wrong.” to Claude/ChatGPT/etc. and it will start to crumble on itself and get into a loop that hallucinates over and over. It won’t fight back, even if it happened to be right to begin with. It has no backbone to be confident it is right.
diggan|6 months ago
Yeah, it's seems to be a terrible approach to try to "correct" the context by adding clarifications or telling it what's wrong.
Instead, start from 0 with the same initial prompt you used, but improve it so the LLM gets it right in the first response. If it still gets it wrong, begin from 0 again. The context seems to be "poisoned" really quickly, if you're looking for accuracy in the responses. So better to begin from the beginning as soon as it veers off course.
eru|6 months ago
The grand-parent comment was pointing out that this limitation exists; not that it can't be worked around.
Breza|6 months ago
cameldrv|6 months ago
If the question is about harder facts which the human disagrees with, this may put it into an essentially self-contradictory state, where the locus of possibilitie gets squished from each direction, and so the model is forced to respond with crazy outliers which agree with both the human and the data. The probability of an invented reference being true may be very low, but from the model's perspective, it may still be one of the highest probability outputs among a set of bad choices.
What it sounds like they may have done is just have the humans tell it it's wrong when it isn't, and then award it credit for sticking to its guns.
ashdksnndck|6 months ago
petesergeant|6 months ago
Fucking Gemini Pro on the other hand digs in, and starts deciding it's in a testing scenario and get adversarial, starts claiming it's using tools the user doesn't know about, etc etc