top | item 28767752

(no title)

cbutner | 4 years ago

It is using a full-sized transformer decoder, trained on about 1 million data samples, but with far fewer neural network parameters and training samples than GPT-2 or GPT-3.

discuss

No comments yet.