top | item 41757435

(no title)

There have been a spree of recent experiments with LLMs solving logic puzzles (specifically Cheryl's Birthday). I wanted to replicate and repeat the tests from [0] with more LLMs. For reference, that article tested whether the trained models handled obfuscation of the text so that verbatim discussions of the solution were less likely to appear in the training corpus.

Then I wanted to move further and test whether LLMs were prone to distraction with extraneous and irrelevant data. In a world where RAG may pull in "compromised" data, I wanted to see if LLMs could ignore cruft or if it would alter their answer. TL;DR - it altered the answers.

o1 dropped as I was making graphs etc so, I included the results from testing it as an additional section. It was still distractable but was more capable in the obfuscated case.

Forgive the bait headline, I'm still trying to find the best balance of information and marketing for posts like this. Suggestions welcome on that front.

[0] https://timharford.com/2024/08/ai-has-all-the-answers-even-t...

discuss

No comments yet.