top | item 35099766

(no title)

mota7 | 3 years ago

The paper says "... optimized on next-word prediction only". Which is absolutely correct in 2023.

ChatGPT (and indeed all recent LLMs) using much more complex training methods than simply 'next-word prediction'.

discuss

This passage makes two claims

* one, applicable to current language models (which ChatGPT is one of them), claim that they "they fail to capture several syntactic constructs and semantics properties" and "their linguistic understanding is superficial". It gives an example, "they tend to incorrectly assign the verb to the subject in nested phrases like ‘the keys that the man holds ARE here", which is not the kind of mistake that ChatGPT makes.

* Another claim, is that "when text generation is optimized on next-word prediction only" then "deep language models generate bland, incoherent sequences or get stuck in repetitive loops". Only this second claim is relative to next-word prediction.

abecedarius|3 years ago

Yeah, that struck me too. I followed one of the refs at random and it was to a 2020 paper about RNNs.