top | item 39229755 Arrows of Time for Large Language Models 6 points| tianlong | 2 years ago |arxiv.org 3 comments order hn newest nyoncore|2 years ago Isn't it obvious that since LLM are trained to predict the next word they do better than to predict the previous one? frotaur|2 years ago In the paper it is mentioned that the LLMs predicting the previous token are indeed pre-trained in this way, so it is not true that the difference is obvious. unknown|2 years ago [deleted] tianlong|2 years ago There is a link with entropy creation?
nyoncore|2 years ago Isn't it obvious that since LLM are trained to predict the next word they do better than to predict the previous one? frotaur|2 years ago In the paper it is mentioned that the LLMs predicting the previous token are indeed pre-trained in this way, so it is not true that the difference is obvious. unknown|2 years ago [deleted]
frotaur|2 years ago In the paper it is mentioned that the LLMs predicting the previous token are indeed pre-trained in this way, so it is not true that the difference is obvious.
nyoncore|2 years ago
frotaur|2 years ago
unknown|2 years ago
[deleted]
tianlong|2 years ago