This is an interesting paper and I like this kind of mechanistic interpretability work - but I cannot figure out how the paper title "Transformers know more than they can tell" relates to the actual content. In this case what is it that they know and can't tell?
I believe it's a reference to the paper "Language Models (Mostly) Know What They Know".
There's definitely some link but I'd need to give this paper a good read and refresh on the other to see how strong. But I think your final sentence strengthens my suspicion
Ok, I've read the paper and now I wonder, why did they stop at the most interesting part?
They did all that work to figure out that learning "base conversion" is the difficult thing for transformers. Great! But then why not take that last remaining step to investigate why that specifically is hard for transformers? And how to modify the transformer architecture so that this becomes less hard / more natural / "intuitive" for the network to learn?
Why release one paper when you can release two? Easier to get citations if you spread your efforts, and if you're lucky, someone needs to reference both of them.
A more serious answer might be that it was simply out of scope of what they set out to do, and they didn't want to fall for scope-creep, which is easier said than done.
Author, here. The paper is about the Collatz sequence, how experiments with a transformer can point at interesting facts about a complex mathematical phenomenon, and how, in supervised math transformers, model predictions and errors can be explained (this part is a follow-up to a similar paper about GCD). From a ML research perspective, the interesting (but surprising) take away is the particular way the long Collatz function is learned: "one loop at a time".
To me, the base conversion is a side quest. We just wanted to rule out this explanation for the model behavior. It may be worth further investigation, but it won't be by us. Another (less important) reason is paper length, if you want to submit to peer reviewed outlets, you need to keep pages under a certain number.
cuz you don't sell nonsense in one piece.
it used to be "repeat a lie often enough" ...
now lies are split into pieces ...
you'll see more of all that in the next few years.
but if you wanna stay in awe, at your age and further down the road, don't ask questions like you just asked.
be patient and lean into the split.
brains/minds have been FUBARed. all that remains is buying into the fake, all the way down to faking it when your own children get swooped into it all.
"transformers" "know" and "tell" ... and people's favorite cartoon characters will soon run hedge funds but the rest of the world won't get their piece ... this has all gone too far and to shit for no reason.
I don't know that computers can model arbitrary length sine waves either. At least not in the sense of me being able to input any `x` and get `sin(x)` back out. All computers have finite memory, meaning they can only represent a finite number of numbers, so there is some number `x` above which they can't represent any number.
Neural networks are more limited of course, because there's no way to expand their equivalent of memory, while it's easy to expand a computer's memory.
I'll take a shot at it. Using collatz as the specific target for investigating the underlying concepts here seems like a big red-herring that's going to generate lots of confused takes. (I guess it was done partly to have access to tons of precomputed training data and partly to generate buzz. The title also seems kind of poorly chosen and/or misleading)
Really the paper is about mechanistic interpretation and a few results that are maybe surprising. First, the input representation details (base) matters a lot. This is perhaps very disappointing if you liked the idea of "let the models work out the details, they see through the surface features to the very core of things". Second, learning was burst'y with discrete steps, not smooth improvement. This may or may not be surprising or disappointing.. it depends how well you think you can predict the stepping.
The model partially solves the problem but fails to learn the correct loop length:
> An investigation of model errors (Section 5) reveals that, whereas large language models commonly “hallucinate” random solutions, our models fail in principled ways. In almost all cases, the models perform the correct calculations for the long Collatz step, but use the wrong loop lengths, by setting them to the longest loop lengths they have learned so far.
jebarker|2 months ago
godelski|2 months ago
There's definitely some link but I'd need to give this paper a good read and refresh on the other to see how strong. But I think your final sentence strengthens my suspicion
https://arxiv.org/abs/2207.05221
rikimaru0345|2 months ago
They did all that work to figure out that learning "base conversion" is the difficult thing for transformers. Great! But then why not take that last remaining step to investigate why that specifically is hard for transformers? And how to modify the transformer architecture so that this becomes less hard / more natural / "intuitive" for the network to learn?
embedding-shape|2 months ago
A more serious answer might be that it was simply out of scope of what they set out to do, and they didn't want to fall for scope-creep, which is easier said than done.
fcharton|2 months ago
To me, the base conversion is a side quest. We just wanted to rule out this explanation for the model behavior. It may be worth further investigation, but it won't be by us. Another (less important) reason is paper length, if you want to submit to peer reviewed outlets, you need to keep pages under a certain number.
fiveMoreCents|2 months ago
you'll see more of all that in the next few years.
but if you wanna stay in awe, at your age and further down the road, don't ask questions like you just asked.
be patient and lean into the split.
brains/minds have been FUBARed. all that remains is buying into the fake, all the way down to faking it when your own children get swooped into it all.
"transformers" "know" and "tell" ... and people's favorite cartoon characters will soon run hedge funds but the rest of the world won't get their piece ... this has all gone too far and to shit for no reason.
Onavo|2 months ago
ChadNauseam|2 months ago
Neural networks are more limited of course, because there's no way to expand their equivalent of memory, while it's easy to expand a computer's memory.
kirubakaran|2 months ago
niek_pas|2 months ago
robot-wrangler|2 months ago
Really the paper is about mechanistic interpretation and a few results that are maybe surprising. First, the input representation details (base) matters a lot. This is perhaps very disappointing if you liked the idea of "let the models work out the details, they see through the surface features to the very core of things". Second, learning was burst'y with discrete steps, not smooth improvement. This may or may not be surprising or disappointing.. it depends how well you think you can predict the stepping.
esafak|2 months ago
> An investigation of model errors (Section 5) reveals that, whereas large language models commonly “hallucinate” random solutions, our models fail in principled ways. In almost all cases, the models perform the correct calculations for the long Collatz step, but use the wrong loop lengths, by setting them to the longest loop lengths they have learned so far.
The article is saying the model struggles to learn a particular integer function. https://en.wikipedia.org/wiki/Collatz_conjecture
poszlem|2 months ago
[deleted]