top | item 47043200

(no title)

> This is literally the Gell-Mann Amnesia Effect in action.

Absolutely! But there is some nuance, here. The failure mode is for an ambiguous question, which is an open research topic. There is no objectively correct answer to "Should I walk or drive?" given the provided constraints.

Because handling ambiguities is a problem that researchers are actively working on, I have confidence that models will improve on these situations. The improvements may asymptotically approach zero, leading to ever increasingly absurd examples of the failure mode. But that's ok, too. It means the models will increase in accuracy without becoming perfect. (I think I agree with Stephen Wolfram's take on computationally irreducibility [1]. That handling ambiguity is a computationally irreducible problem.)

EWD was right, of course, and you are too for pointing out rigorous languages. But the interactivity with an LLM is different. A programming language cannot ask clarifying questions. It can only produce broken code or throw a compiler error. We prefer the compiler errors because broken code does not work, by definition. (Ignoring the "feature not a bug" gag.)

Most of the current models are fine-tuned to "produce broken code" rather than "compiler error" in these situations. They have the capability of asking clarifying questions, they just tend not to, because the RL schedule doesn't reward it.

[1]: https://writings.stephenwolfram.com/2017/05/a-new-kind-of-sc...

discuss

rluna828|12 days ago

Producing fewer "Compiler errors" and more "broken code errors" is a fundamental failure. The cost of detecting compiler errors is lower than detecting broken code. If the cost of detecting and fixing broken code increases at the same rate as LLMs "improve" then their net benefit will remain fixed. I asked my five year old the above "brain teaser" and he got it right. I did a follow up of what should he wash at a car wash if he walked there, he said, "my hands." Chat answered with more giberish.

jason_oster|11 days ago

I agree it is a fundamental failure of the current state of models. I believe it is solvable. The nuance is just that "solving" the problem might not look like what we think of as a solution. Hence the asymptote.