top | item 43159049

(no title)

zzbzq | 1 year ago

It's the other way around. The model is impeccable at "understanding text." It's a gigantic mathematical spreadsheet that quantifies meaning. The model probably "understands" better than any human ever could. Running that backwards into producing new text is where it gets hand-wavy & it becomes unclear if the generative algorithms are really progressing on the same track that humans are on, or just some parallel track that diverges or even terminates early.

discuss

order

nottorp|1 year ago

I thought it quantifies the probability that a certain word (their output) follows a given word sequence (their training corpus and the prompt)?

ben_w|1 year ago

Only if you wildly oversimply to the level of being misleading.

The precise mechanism LLMs use for reaching their probability distributions is why they are able to pass most undergraduate level exams, whereas the Markov chain projects I made 15-20 years ago were not.

Even as an intermediary, word2vec had to build a space in which the concept of "gender" exists such that "man" -> "woman" ~= "king" -> "queen".

gs17|1 year ago

Simplifying to that point is more of what a Markov chain is. LLMs are able to generalize a lot more than that, and it's sufficient to "understand text" on a decent level. Even a relatively small model can take, e.g. even this poorly prompted request:

  "The user has requested 'remind me to pay my bills 8 PM tomorrow'. The current date is 2025-02-24. Your available commands are 'set_reminder' (time, description), 'set_alarm' (time), 'send_email' (to, subject, content). Respond with the command and its inputs."
And the most likely response will be what the user wanted.

A Markov chain (only using the probabilities of word orders from sentences in its training set) could never output a command that wasn't stitched together from existing ones (i.e. it would always output a valid command name, but if no one had requested a reminder for a date in 2026 before it was trained, it would never output that year). No amount of documents saying "2026 is the year after 2025" would make a Markov chain understand that fact, but LLMs are able to "understand" that.