(no title)
egnehots | 1 year ago
- How many 'r's are in Strawberry?
- Finding the fourth word of the response
These tests are at odds with the tokenizer and next-word prediction model. They do not accurately represent an LLM's capabilities. It's akin to asking a blind person to identify colors.
firebaze|1 year ago
> Here's "strawberry" spelled out one character per line: s t r a w b e r r y
Most LLMs can handle that perfectly. Meaning, they can abstract over tokens into individual characters. Yet, most lack the ability to perform that multi-level inference to count individual 'r's.
From this perspective, I think it's the opposite. Something like the strawberry-tests is a good indicator how far the LLM is able to connect individually easy, but not readily interconnected steps.
darksaints|1 year ago
maccam912|1 year ago