top | item 35369323

(no title)

> The difference between them is only one digit (i.e., the last number). Therefore, it's not possible to tell if either value is greater or lesser by just looking at their values without knowing more information about what those numbers represent and how they were obtained in the first place.

That one is especially hilarious. But the part at the end "how they were obtained" is really strange. Where in its dataset would it possibly have learned such nonsense? Doesn't matter where numbers come from to compare them.

It implies that it doesn't understand what numbers even are in general, and that giving it a calculator (that it can use perfectly) only masks a much deeper problem.

I mean, I'm reading a ton of people say "its not just pattern recognition and token prediction, it has emergent properties!!!!" and from experimentation I believe it.

But if the models can pick up language and its intricacies, and even do simple logic tasks, shouldn't it also be able to pick up on what numbers are and how they work? At least knowing that where a number came from doesn't matter when its just about comparing their value in a pure mathematical sense?

What does that mean for concepts other than numbers? Do those models fake a LOT more than we already believe they do?

discuss

sbierwagen|2 years ago

I'm inclined to think that the model is just too small, or that integer quantization damaged it badly. It has the GPT-2 trait of getting "stuck", (mode collapse?) if it gives a bad response once, it will tend to repeat itself over and over. (Repetition was also a bad GPT-2 trait)

It's important to note that 7 billion parameters really is very small. 20 times smaller than GPT-3. Smaller still than ChatGPT or GPT-4. I find it plausible that in the future there will be distilled models substantially smaller than GPT-3 but with all its power, but GPT4ALL-LLaMA-7B isn't it.