top | item 45996316

(no title)

inciampati | 3 months ago

Markov chains have exponential falloff in correlations between tokens over time. That's dramatically different than real text which contains extremely long range correlations. They simply can't model long range correlations. As such, they can't be guided. They can memorize, but not generalize.

discuss

order

kittikitti|3 months ago

As someone who developed chatbots with HMM's and the Transformers algorithms, this is a great and succinct answer. The paper, Attention Is All You Need, solved this drawback.

vjerancrnjak|3 months ago

Markov Random Fields also do that.

Difference is obviously there but nothing prevents you from undirected conditioning of long range dependencies. There’s no need to chain anything.

The problem from a math standpoint is that it’s an intractable exercise. The moment you start relaxing the joint opt problem you’ll end up at a similar place.

zwaps|3 months ago

This is the correct answer