top | item 39229755

Arrows of Time for Large Language Models

6 points| tianlong | 2 years ago |arxiv.org

3 comments

order

nyoncore|2 years ago

Isn't it obvious that since LLM are trained to predict the next word they do better than to predict the previous one?

frotaur|2 years ago

In the paper it is mentioned that the LLMs predicting the previous token are indeed pre-trained in this way, so it is not true that the difference is obvious.

tianlong|2 years ago

There is a link with entropy creation?