Transformers know more than they can tell: Learning the Collatz sequence

jebarker|2 months ago

This is an interesting paper and I like this kind of mechanistic interpretability work - but I cannot figure out how the paper title "Transformers know more than they can tell" relates to the actual content. In this case what is it that they know and can't tell?

godelski|2 months ago

I believe it's a reference to the paper "Language Models (Mostly) Know What They Know".

There's definitely some link but I'd need to give this paper a good read and refresh on the other to see how strong. But I think your final sentence strengthens my suspicion

https://arxiv.org/abs/2207.05221

rikimaru0345|2 months ago

Ok, I've read the paper and now I wonder, why did they stop at the most interesting part?

They did all that work to figure out that learning "base conversion" is the difficult thing for transformers. Great! But then why not take that last remaining step to investigate why that specifically is hard for transformers? And how to modify the transformer architecture so that this becomes less hard / more natural / "intuitive" for the network to learn?

embedding-shape|2 months ago

Why release one paper when you can release two? Easier to get citations if you spread your efforts, and if you're lucky, someone needs to reference both of them.

A more serious answer might be that it was simply out of scope of what they set out to do, and they didn't want to fall for scope-creep, which is easier said than done.

fcharton|2 months ago

Author, here. The paper is about the Collatz sequence, how experiments with a transformer can point at interesting facts about a complex mathematical phenomenon, and how, in supervised math transformers, model predictions and errors can be explained (this part is a follow-up to a similar paper about GCD). From a ML research perspective, the interesting (but surprising) take away is the particular way the long Collatz function is learned: "one loop at a time".

To me, the base conversion is a side quest. We just wanted to rule out this explanation for the model behavior. It may be worth further investigation, but it won't be by us. Another (less important) reason is paper length, if you want to submit to peer reviewed outlets, you need to keep pages under a certain number.

fiveMoreCents|2 months ago

cuz you don't sell nonsense in one piece. it used to be "repeat a lie often enough" ... now lies are split into pieces ...

you'll see more of all that in the next few years.

but if you wanna stay in awe, at your age and further down the road, don't ask questions like you just asked.

be patient and lean into the split.

brains/minds have been FUBARed. all that remains is buying into the fake, all the way down to faking it when your own children get swooped into it all.

"transformers" "know" and "tell" ... and people's favorite cartoon characters will soon run hedge funds but the rest of the world won't get their piece ... this has all gone too far and to shit for no reason.

Onavo|2 months ago

Interesting, what about the old proof that neural networks can't model arbitrary length sine waves?

ChadNauseam|2 months ago

I don't know that computers can model arbitrary length sine waves either. At least not in the sense of me being able to input any `x` and get `sin(x)` back out. All computers have finite memory, meaning they can only represent a finite number of numbers, so there is some number `x` above which they can't represent any number.

Neural networks are more limited of course, because there's no way to expand their equivalent of memory, while it's easy to expand a computer's memory.

kirubakaran|2 months ago

That proof only applies to fixed architecture feed forward multilayer perceptrons with no recurrence, iirc. Transformers are not that.

niek_pas|2 months ago

Can someone ELI5 this for a non-mathematician?

robot-wrangler|2 months ago

I'll take a shot at it. Using collatz as the specific target for investigating the underlying concepts here seems like a big red-herring that's going to generate lots of confused takes. (I guess it was done partly to have access to tons of precomputed training data and partly to generate buzz. The title also seems kind of poorly chosen and/or misleading)

Really the paper is about mechanistic interpretation and a few results that are maybe surprising. First, the input representation details (base) matters a lot. This is perhaps very disappointing if you liked the idea of "let the models work out the details, they see through the surface features to the very core of things". Second, learning was burst'y with discrete steps, not smooth improvement. This may or may not be surprising or disappointing.. it depends how well you think you can predict the stepping.

esafak|2 months ago

The model partially solves the problem but fails to learn the correct loop length:

> An investigation of model errors (Section 5) reveals that, whereas large language models commonly “hallucinate” random solutions, our models fail in principled ways. In almost all cases, the models perform the correct calculations for the long Collatz step, but use the wrong loop lengths, by setting them to the longest loop lengths they have learned so far.

The article is saying the model struggles to learn a particular integer function. https://en.wikipedia.org/wiki/Collatz_conjecture

poszlem|2 months ago

[deleted]

45 comments