top | item 47162993

(no title)

Treat LLMs as dyslexic when it comes to spelling. Assess their strengths and weaknesses accordingly.

discuss

They're literally text generators so that's... troubling

They're text generators, but you can think of them as basically operating with a different alphabet than us. When they are given text input, it's not in our alphabet, and when they produce text output it's also not in our alphabet. So when you ask them what letters are in a given word, they're literally just guessing when they respond.

Rather, they use tokens that are usually combinations of 2-8 characters. You can play around with how text gets tokenized here: https://platform.openai.com/tokenizer

_____

For example, the above text I wrote has 504 characters, but 103 tokens.

soleveloper|3 days ago

There are incredible authors who happen to be dyslexic, and brilliant mathematicians who struggle with basic arithmetic. We don't dismiss their core work just because a minor lemma was miscalculated or a word was misspelled. The same logic applies here: if we dismiss the semantic capabilities of these models based entirely on their token-level spelling flaws, we miss out on their actual utility.