(no title)
no_op | 8 months ago
The authors speculate that this pattern is a consequence of reasoning models actually solving these puzzles by way of pattern-matching to training data, which covers some puzzles at greater depth than others.
Great. That's one possible explanation. How might you support it?
- You could systematically examine the training data, to see if less representation of a puzzle type there reliably correlates with worse LLM performance.
- You could test how successfully LLMs can play novel games that have no representation in the training data, given instructions.
- Ultimately, using mechanistic interpretability techniques, you could look at what's actually going on inside a reasoning model.
This paper, however, doesn't attempt any of these. People are getting way out ahead of the evidence in accepting its speculation as fact.
somethingsome|8 months ago
You transform your training data in a very strange and high dimensional space. Then when you write an input, you calculate the distance between that input and the closest point in that space.
So, in some sense.. You pattern match your input with the training data. Of course, in a very non intuitive way for humans.
Now, it doesn't necessarily imply things as 'models cannot solve new problems not seen before' we don't know if our problem could get matched to something completely unrelated for us, but in that space it makes sense.
So with your experiments, if the model is able to solve a new puzzle never seen before, you'll never know why, but it doesn't imply either that the new puzzle was not matched in some sense to some previous data in the dataset.