top | item 40499755

(no title)

Exactly, we need a much more granular approach to evaluating intelligence and generality. Our current conception of intelligence largely works because humans share evolutionary history and partake in the same 10+ years of standardized training. As such, many dimensions of our intelligence correlate quite a bit, and you can likely infer a person's "general" proficiency or education by checking only a subset of those dimensions. If someone can't do arithmetic then it's very unlikely that they'll be able to compute integrals.

LLMs don't share that property, though. Their distribution of proficiency over various dimensions and subfields is highly variable and only slightly correlated. Therefore, it makes no sense to infer the ability or inability to perform some magically global type of reasoning or generalization from just a subset of tasks, the way we do for humans.

discuss

pmayrgundter|1 year ago

Agreed on the first part, but for LLMs not having correlated capabilities, I think we've seen they do. As the GPTs progress, mainly by model size, their scores across a battery of tests goes up, eg OpenAI's paper for ChatGPT 4, showing a leap in performance across a couple dozen tests.

Also found this, a Mensa test for across the top dozen frontier models.

https://www.maximumtruth.org/p/ais-ranked-by-iq-ai-passes-10...

That does seem to me to be demonstrating a global type of reasoning or generalization.

Also see the author's note that at least with Claude, they seem to be releasing about every 20 IQ points.