top | item 37789352

(no title)

ma2rten | 2 years ago

Yes, I think that is a reasonable way to think about it, in my opinion. However, with the language modeling objective it predicts the next token and because of the residual connections each intermediate layer is in the same space. So, maybe it would be more accurate to say that it is an increasingly accurate representation of the next token.

discuss

order

No comments yet.