top | item 38842428

(no title)

ahgamut | 2 years ago

> Instead, many have deeply underestimated LLMs, saying that after all they were nothing more than somewhat advanced Markov chains, capable, at most, of regurgitating extremely limited variations of what they had seen in the training set. Then this notion of the parrot, in the face of evidence, was almost universally retracted.

I'd like to see this evidence, and by that I don't mean someone just writing a blog post or tweeting "hey I asked an LLM to do this, and wow". Is there a numerical measurement, like training loss or perplexity, that quantifies "outside the training set"? Otherwise, I find it difficult to take statements like the above seriously.

LLMs can do some interesting things with text, no doubt. But these models are trained on terabytes of data. Can you really guarantee "there is no part of my query that is in the training set, not even reworded"? Perhaps we can grep through the training set every time one of these claims are made.

discuss

skepticATX|2 years ago

Exactly. I think that it’s very hard for us to comprehend just how much is out there on the internet.

The perfect example of that is the tikz unicorn in the Sparks paper. Seemed like a unique task, until someone found a tikz unicorn in an obscure website.

There is plenty of evidence that LLMs struggle as you move out of distribution. Which makes perfect sense as long as you stop trying to attribute what they’re doing to magic.

This doesn’t mean they’re not useful, of course. But it means that we should should be skeptical about wild capability claims until we have better evidence than a tweet, as you put it.

Legend2440|2 years ago

They didn't actually find a unicorn ; they found other tikz animals. It still generalized to the unicorn.

This was the package: https://ctan.org/pkg/tikzlings?lang=en

famouswaffles|2 years ago

>Can you really guarantee "there is no part of my query that is in the training set, not even reworded"?

I mean..yes?

Multi digit arithmetic, translation, summarization. There are many tasks where this is trivial.