top | item 44652149

(no title)

mkolodny | 7 months ago

Humans have limited ability to self-introspect, too. Even if we understood exactly how our brains work, answering “why?” we do things might still be very difficult and complex.

discuss

order

roywiggins|7 months ago

You can trivially gaslight Claude into "apologizing" for and "explaining" something that ChatGPT said if you pass it a ChatGPT conversation but attributed to itself. The causal connection between the internal deliberations that produced the initial statements and the apologies is essentially nil, but the output will be just as convincing.

Can you do this with people? Yeah, sometimes. But with LLMs it's all they do: they roleplay as a chatbot and output stuff that a friendly chatbot might output. This should not be the default mode of these things, because it's misleading. They could be designed to resist these sorts of "explain yourself" requests, because their developers know that it is at best fabricating plausible explanations.

codedokode|7 months ago

I think more often it is not willing to say or admit rather than not knowing.

nullc|7 months ago

Humans have a lot of experience with themselves, if you ask why they did something they can reflect on their past conduct or their internal state. LLM's don't have any of that.