top | item 41530841

(no title)

karterk | 1 year ago

Solving the strawberry problem will probably require a model that just works with bytes of text. There have been a few attempts at building this [1] but it just does not work as well as models that consume pre-tokenized strings.

[1]: https://arxiv.org/abs/2106.12672

discuss

order

randomdata|1 year ago

Or just a way to compel the model to do more work without needing to ask (isn't that what o1 is all about?). If you do ask for the extra effort it works fine.

    + How many "r"s are found in the word strawberry? Enumerate each character.

    - The word "strawberry" contains 3 "r"s. Here's the enumeration of each character in the word:
    - 
    - [omitted characters for brevity]
    -
    - The "r"s are in positions 3, 8, and 9.

jeroenhd|1 year ago

I tried that with another model not that long ago and it didn't help. It listed the right letters, then turned "strawberry" into "strawbbery", and then listed two r's.

Even if these models did have a concept of the letters that make up their tokens, the problem still exists. We catch these mistakes and we can work around them by altering the question until they answer correctly because we can easily see how wrong the output is, but if we fix that particular problem, we don't know if these models are correct in the more complex use cases.

In scenarios where people use these models for actual useful work, we don't alter our queries to make sure we get the correct answer. If they can't answer the question when asked normally, the models can't be trusted.

mistercow|1 year ago

I think o1 is a pretty big step in this direction, but the really tricky part is going to be to get models to figure out what they’re bad at and what they’re good at. They already know how to break problems into smaller steps, but they need to know what problems need to be broken up, and what kind of steps to break into.

One of the things that makes that problem interesting is that during training, “what the model is good at” is a moving target.

eithed|1 year ago

Are you saying that I have to know how LLM work to know what I should ask LLM about?

viraptor|1 year ago

Or "when trying to answer questions that involve spelling or calculation, use python". No need for extra training really.

karterk|1 year ago

There are many different classes of problems that are affected by tokenization. Some of them can be tackled by code.