(no title)
ahgamut | 2 years ago
I'd like to see this evidence, and by that I don't mean someone just writing a blog post or tweeting "hey I asked an LLM to do this, and wow". Is there a numerical measurement, like training loss or perplexity, that quantifies "outside the training set"? Otherwise, I find it difficult to take statements like the above seriously.
LLMs can do some interesting things with text, no doubt. But these models are trained on terabytes of data. Can you really guarantee "there is no part of my query that is in the training set, not even reworded"? Perhaps we can grep through the training set every time one of these claims are made.
skepticATX|2 years ago
The perfect example of that is the tikz unicorn in the Sparks paper. Seemed like a unique task, until someone found a tikz unicorn in an obscure website.
There is plenty of evidence that LLMs struggle as you move out of distribution. Which makes perfect sense as long as you stop trying to attribute what they’re doing to magic.
This doesn’t mean they’re not useful, of course. But it means that we should should be skeptical about wild capability claims until we have better evidence than a tweet, as you put it.
Legend2440|2 years ago
This was the package: https://ctan.org/pkg/tikzlings?lang=en
famouswaffles|2 years ago
I mean..yes?
Multi digit arithmetic, translation, summarization. There are many tasks where this is trivial.