(no title)
vbuterin | 4 years ago
"224" is actually a really nice object to recognize because it's 7 * 32, and if you can recognize other multiples of 32 it frequently gives you shortcuts. It's less useful for addition because you would need to get lucky and get a multiple of 32 (or 7) on both sides, but for multiplication and division it helps a lot.
mordymoop|4 years ago
What GPTs have to deal with is more like, you are fed an arithmetic problem via colored slips of paper, and you just have to remember that this particular shade of chartreuse means "224", which you happen to have memorized equals 7 * 32, etc., but then the next slip of paper is off-white which means "1", and now you have to mentally shift everything ...
chaxor|4 years ago
It learns what level of detail in the tokenization is needed for given tasks. For example, If you're not interested in parsing the problem for actually doing the computation for example, you don't pay attention to the finer tokenization'. If you do need that level of detail, you use those finer groupings. Some of the difficulty a few years ago was trying to extend these models to handle longer contexts (or just variable contexts which can go to very long), but that also seems close to solved now too. So you're not exactly giving much insight with this observation.
dr_zoidberg|4 years ago
mannykannot|4 years ago