top | item 28767752 (no title) cbutner | 4 years ago It is using a full-sized transformer decoder, trained on about 1 million data samples, but with far fewer neural network parameters and training samples than GPT-2 or GPT-3. discuss order hn newest No comments yet.
No comments yet.