top | item 38737939 (no title) ajtejankar | 2 years ago The base model has 32 layers and there is a single linear layer for language modeling (going from embeddings to the vocabulary) that gets applied at the very end. discuss order hn newest No comments yet.
No comments yet.