top | item 41585153

(no title)

azulster | 1 year ago

yes, you are missing that the tokens aren't words, they are 2-3 letter groups, or any number of arbitrary sizes depending on the model

discuss

smokel|1 year ago

Nope, I'm not missing that particular fact. I'm aware that sentences (and words) are split into tokens, which are vectors.

I don't understand how most LLMs can spell out words though, nor do I understand what is causing the failure to count characters in words. I was not convinced by the comment I was responding to.