top | item 38737939

(no title)

ajtejankar | 2 years ago

The base model has 32 layers and there is a single linear layer for language modeling (going from embeddings to the vocabulary) that gets applied at the very end.

discuss

order

No comments yet.