The conclusions reached in the paper and the headline differ significantly. Not sure why you took a line from the abstract when even further down it notes that it's that some elements of "truthfulness" are encoded and that "truth" as a concept is multifaceted. Further noted is that LLMs can encode the correct answer and consistently output the incorrect one, with strategies mentioned in the text to potentially reconcile the two, but as of yet no real concrete solution.
No comments yet.