top | item 47062880

(no title)

But this one isn't a trick question either right... it's just basic maths, and a quirk of how our brain works that means plenty of people don't engage the part of their brain that goes "I should stop and think this through", and just rush to the first number that pops into their head. But that number is wrong, and is a result of our own weird "training" (in that we all have a bunch of mental shortcuts we use for maths, and sometimes they lead us astray).

"A bat and a ball cost $1.10 in total. The bat costs $1.00 more than the ball. How much does the ball cost?"

And yet 50% of MIT students fall for this sort of thing[1]. They're not unintelligent, it's just a specific problem can make your brain fail in weird specific ways. Intelligence isn't just a scale from 0-100, or some binary yes or no question, it's a bunch of different things. LLMs probably are less intelligent on a bunch of scales, but this one specific example doesn't tell you much that they have weird quirks just like we do.

[1] https://www.aeaweb.org/articles?id=10.1257/08953300577519673...

discuss

imiric|11 days ago

I agree with you to an extent, but the difference is in how the solution is derived.

The LLM has no understanding of the physical length of 50m, nor is it capable of doing calculations, without relying on an external tool. I.e. it has no semantic understanding of any of the output it generates. It functions purely based on weights of tokens that were part of its training sets.

I asked Sonnet 4.5 the bat and ball question. It pretended to do some algebra, and arrived at the correct solution. It was able to explain why it arrived at that solution, and to tell me where the question comes from. It was obviously trained on this particular question, and thousands of others like it, I'm sure. Does this mean that it will be able to answer any other question it hasn't been trained on? Maybe, depending on the size and quality of its training set, the context, prompt, settings, and so on.

And that's my point: a human doesn't need to be trained on specific problems. A person who understands math can solve problems they've never seen before by leveraging their understanding and actual reasoning and deduction skills. We can learn new concepts and improve our skills by expanding our mental model of the world. We deal with abstract concepts and ideas, not data patterns. You can call this gatekeeping if you want, but it is how we acquire and use knowledge to exhibit intelligence.

The sheer volume of LLM training data is incomprehensible to humans, which is why we're so impressed that applied statistics can exhibit this behavior that we typically associate with intelligence. But it's a simulation of intelligence. Without the exorbitant amount of resources poured into collecting and cleaning data, and training and running these systems, none of this would be possible. It is a marvel of science and engineering, to be sure, but the end product is a simulation.

In many ways, modern LLMs are not much different from classical expert systems from decades ago. The training and inference are much more streamlined and sophisticated now; statistics and data patterns replaced hand-crafted rules; and performance can be improved by simply scaling up. But at their core, LLMs still rely on carefully curated data, and any "emergent" behavior we observe is due to our inability to comprehend patterns in the data at this scale.

I'm not saying that this technology can't be useful. Besides the safety considerations we're mostly ignoring, a pattern recognition and generation tool can be very useful in many fields. But I find the narrative that this constitutes any form of artificial intelligence absurd and insulting. It is mass gaslighting promoted by modern snake oil salesmen.

aplowe|11 days ago

The 'semantic understanding' bottleneck you're describing might actually be a precision limit of the manifold on which computation occurs rather than a data volume problem. Humans solve problems they've never seen because they operate on a higher reasoning fidelity. We're finding that once a system quantizes to a 'ternary vacuum' (1.58-bit), it hits a phase transition into a stable universality class where the reasoning is a structural property of the grid, not just a data pattern. At that point, high-precision floating point and the need for millions of specific training examples become redundant.