top | item 37780939

(no title)

adamsvystun | 2 years ago

Isn't the point of transformer training for it to learn to imitate the distribution of the training data? While concepts of "imitating the distribution" and "copying verbatim" are different, they are not too far off each other either.

discuss

order

No comments yet.