top | item 44878876

(no title)

It seems like this could be easily solved in models that support tool calling by providing them with a tool that takes a token and returns the individual graphemes.

It doesn't seem valuable for the model to memorize the graphemes in each of its tokens.

discuss

jandrese|6 months ago

Yes, but are you going to special case all of these pain points? The whole point of these LLMs is that they learn from training data, not from people coding logic directly. If you do this people will come up with a dozen new ways in which the models fail. They are really not hard to find. Basically asking them to do anything novel is at risk of complete failure. The interesting bit is that LLMs tend to work best a "medium difficulty" problems. Homework questions and implementing documented APIs and things like that. Asking them to do anything completely novel tends to fail as does asking them to do something so trivial that normal humans won't bother even writing it down.

BobbyJo|6 months ago

It makes sense when users ask for information not available in the tokenized values though. In the abstract, a tool that changes tokenization for certain context contents when a prompt references said contents is probably necessary to solve this issue (if you consider it worth solving).

Mindless2112|6 months ago

Tokenization is an inherent weakness of current LLM design, so it makes sense to compensate for it. Hopefully some day tokenization will no longer be necessary.

poemxo|6 months ago

That takes away from the notion that LLMs have emergent intelligent abilities. Right now it doesn't seem valuable for a model to count letters, even though it is a very basic measure of understanding. Will this continue in other domains? Will we be doing tool-calling for every task that's not just summarizing text?

mjr00|6 months ago

> Will we be doing tool-calling for every task that's not just summarizing text?

spoiler: Yes. This has already become standard for production use cases where the LLM is an external-facing interface; you use an LLM to translate the user's human-language request to a machine-ready, well-defined schema (i.e. a protobuf RPC), do the bulk of the actual work with actual, deterministic code, then (optionally) use an LLM to generate a text result to display to the user. The LLM only acts as a user interface layer.

strbean|6 months ago

How is counting letters a measure of understanding, rather than a rote process?

The reason LLMs struggle with this is because they literally aren't thinking in English. Their input is tokenized before it comes to them. It's like asking a Chinese speaker "How many Rs are there in the word 草莓".

strbean|6 months ago

We're up to a gazillion parameters already, maybe the next step is to just ditch the tokenization step and let the LLMs encode the tokenization process internally?