Did you read the article or just the title? They mention the specific models the researchers were testing and note that increasing model size did not seem to offer much improvement on this metric. It also ends with a discussion of research into methods for improving performance on queries involving negation.
ilaksh|2 years ago
It actually would have seemed like a valid conclusion (although still too general) if the article came out some months ago. But GPT-4 and the very latest model versions from other companies show they were over-generalizing.
Also the model size isn't necessarily the determining factor.
hammyhavoc|2 years ago
[deleted]