(no title)
a1k0n | 2 years ago
So the <|beginning of text|> token, with no context before it, learns to predict the first-token-in-a-document distribution. That's not quite the same as predicting nothing at all.
a1k0n | 2 years ago
So the <|beginning of text|> token, with no context before it, learns to predict the first-token-in-a-document distribution. That's not quite the same as predicting nothing at all.
No comments yet.