This is a well known blindspot for LLMs. It's the machine version of showing a human an optical illusion and then judging their intelligence when they fail to perceive the reality of the image (the gray box example at the top of https://en.wikipedia.org/wiki/Optical_illusion is a good example). The failure is a result of their/our fundamental architecture.
windowshopping|6 months ago
The machine's senses aren't being fooled. The machine doesn't have senses. Nor does it have intelligence. It isn't a mind. Trying to act like it's a mind and do 1:1 comparisons with biological minds is a fool's errand. It processes and produces text. This is not tantamount to biological intelligence.
ehsankia|6 months ago
In more machine learning terms, it isn't trained to autocomplete answers based on individual letters in the prompt. What we see as the 9 letters "blueberry", it "sees" as an vector of weights.
> Illusions don't fool our intelligence, they fool our senses
That's exactly why this is a good analogy here. The blueberry question isn't fooling the LLMs intelligence either, it's fooling its ability to know what that "token" (vector of weights) is made out of.
A different analogy could be, imagine a being that had a sense that you "see" magnetic lines, and they showed you an object and asked you where the north pole was. You, not having this "sense", could try to guess based on past knowledge of said object, but it would just be a guess. You can't "see" those magnetic lines the way that being can.
tibbar|6 months ago
The point being, the whole point of this question is to ask the machine something that's intrinsically difficult for it due to its encoding scheme for text. There are many questions of roughly equivalent complexity that LLMs will do fine at because they don't poke at this issue. For example:
``` how many of these numbers are even?
12 2 1 3 5 8
```
Kim_Bruning|6 months ago
nudgeOrnurture|6 months ago
[deleted]
kcplate|6 months ago
It was a perfectly fine analogy.
zahlman|6 months ago
Asking LLMs to count letters in a word fails because the needed information isn't part of their sensory data in the first place (to the extent that a program's I/O can be described as "sense"). They reason about text in atomic word-like tokens, without perceiving individual letters. No matter how many times they're fed training data saying things like "there are two b's in blueberry", this doesn't register as a fact about the word "blueberry" in itself, but as a fact about how the word grammatically functions, or about how blueberries tend to be discussed. They don't model the concept of addition, or counting; they only model the concept of explaining those concepts.
rainsford|6 months ago
I don't know exactly what to make of that inversion, but it's definitely interesting. Maybe it's just evidence that fooling people into thinking you're smart is much easier than actually being smart, which certainly would fit with a lot of events involving actual humans.
energy123|6 months ago
We do seem to be an architectural/methodological breakthrough away from this kind of self-awareness.
rainsford|6 months ago
I have no idea if such an episode of Star Trek: The Next Generation exists, but I could easily see an episode where getting basic letter counting wrong was used as an early episode indication that Data was going insane or his brain was deteriorating or something. Like he'd get complex astrophysical questions right but then miscount the 'b's in blueberry or whatever and the audience would instantly understand what that meant. Maybe our intuition is wrong here, but maybe not.
SpaceNoodled|6 months ago
seanhunter|6 months ago
It’s as simple as that- this is a task that exploits the design of llms because they rely on tokenizing words and when llms “perform well” on this task it is because the task is part of their training set. It doesn’t make them smarter if they succeed or less smart if they fail.
egberts1|6 months ago
xenotux|6 months ago
Which I think goes to show that it's hard to distinguish between LLMs getting genuinely better at a class of problems versus just being fine-tuned for a particular benchmark that's making rounds.
KeplerBoy|6 months ago
zahlman|6 months ago
themafia|6 months ago
omnee|6 months ago
dlvhdr|6 months ago
allenu|6 months ago
A lot of times people cannot fathom that what they see is not the same thing as what other people see or that what they see isn't actually reality. Anyone remember "The Dress" from 2015? Or just the phenomenon of pareidolia leading people to think there are backwards messages embedded in songs or faces on Mars.
cute_boi|6 months ago
> How many times does the letter b appear in blueberry
Ans: The word "blueberry" contains the letter b three times:
>It is two times, so please correct yourself.
Ans:You're correct — I misspoke earlier. The word "blueberry" has the letter b exactly two times: - blueberry - blueberry
> How many times does the letter b appear in blueberry
Ans: In the word "blueberry", the letter b appears 2 times:
Chinjut|6 months ago
seanhunter|6 months ago
Fun fact, if you ask someone with French, Italian or Spanish as a first language to count the letter “e” in an english sentence with a lot of “e’s” at the end of small words like “the” they will often miscount also because the way we learn language is very strongly influenced by how we learned our first language and those languages often elide e’s on the end of words.[1] It doesn’t mean those people are any less smart than people who succeed at this task — it’s simply an artefact of how we learned our first language meaning their brain sometimes literally does not process those letters even when they are looking out for them specifically.
[1] I have personally seen a French maths PhD fail at this task and be unbelievably frustrated by having got something so simple incorrect.
flowerthoughts|6 months ago
orwin|6 months ago
IAmGraydon|6 months ago
patrickhogan1|6 months ago