I love seeing when an LLM encounters a failure mode that feel akin to "cognitive dissonance". You can almost see them sweat as they try to explain why they just directly contradicted themselves as they spiral into a state of deeper confusion. I wonder if their response is modeled after human behavior when encountering cognitive dissonance. I'm curious how they'd behave if they had no model of human defensiveness in their training set.Anyways I also don't enjoy anthropomorphizing language models, but hey, you went there first :)
No comments yet.