top | item 30301941

(no title)

vbuterin | 4 years ago

When I do mental arithmetic my brain frequently tokenizes into digit pairs or triples if I can recognize pairs and triples that have specific properties.

"224" is actually a really nice object to recognize because it's 7 * 32, and if you can recognize other multiples of 32 it frequently gives you shortcuts. It's less useful for addition because you would need to get lucky and get a multiple of 32 (or 7) on both sides, but for multiplication and division it helps a lot.

discuss

mordymoop|4 years ago

Sure - I think we all learn tricks like that. But you learned that pattern of tokenization, it wasn't arbitrarily foisted on you.

What GPTs have to deal with is more like, you are fed an arithmetic problem via colored slips of paper, and you just have to remember that this particular shade of chartreuse means "224", which you happen to have memorized equals 7 * 32, etc., but then the next slip of paper is off-white which means "1", and now you have to mentally shift everything ...

chaxor|4 years ago

The tokens in most gpt models are small like this, but they still 'learn tokenization' very similar to what you just mentioned. It's part of the multi headed attention.

It learns what level of detail in the tokenization is needed for given tasks. For example, If you're not interested in parsing the problem for actually doing the computation for example, you don't pay attention to the finer tokenization'. If you do need that level of detail, you use those finer groupings. Some of the difficulty a few years ago was trying to extend these models to handle longer contexts (or just variable contexts which can go to very long), but that also seems close to solved now too. So you're not exactly giving much insight with this observation.

dr_zoidberg|4 years ago

I think that part of why the tokenization is a proble for math here is that it doesn't seem to be carrying overflow into the left token. Anyway, I haven't worked with GPT in detail to do a deeper analysis than that hunch, so take my comment with a couple of salt grains.

mannykannot|4 years ago

maybe this is a clue to which ones it succeeds on, and how it goes wrong when it does not.