(no title)
SirMaster | 13 days ago
But then when I get a subpar result, they always tell me I'm "prompting wrong". LLMs may be very capable of great human level output, but in my experience leave a LOT to be desired in terms of human level understanding of the question or prompt.
I think rating an LLM vs a human or AGI should include it's ability to understand a prompt like a human or like an averagely generally intelligent system should be able to.
Are there any benchmarks on that? Like how well LLMs do with misleading prompts or sparsely quantified prompts compared to one another?
Because if a good prompt is as important as people say, then the model's ability to understand a prompt or perhaps poor prompt could have a massive impact on its output.
jason_oster|13 days ago
You might be inclined to say, "a human would always interpret the question as having the car nearby the speaker, 50m away from the carwash." But this is objectively untrue. There are people in this comments section and on the Mastodon thread that found the question to be somewhat confusing.
In other words, the premise that "understand[ing] a prompt like a human" is all that's needed is wrong because not every human interprets ambiguities in the same way. The human phenomenon is well researched in psychology. The LLM equivalent is also well researched, and several proposals have been put forth over the years to address it. This is a pretty good research paper on the subject, and it links to other relevant studies: https://arxiv.org/abs/2511.10453v2 (Although I disagree with their method. I think asking clarifying questions is a superior approach than trying to one-shot every possible interpretation.)
So yes, there is a ton of research on the problem. Some datasets include ambiguous questions and instructions for this reason. A couple of examples are provided in the linked paper.
SirMaster|12 days ago
So it feels like a big area of limitation or a big bottleneck towards getting a good answer.
nosuchthing|13 days ago
hyperstitions from TESCREAL https://www.dair-institute.org/tescreal/