top | item 37780939 (no title) adamsvystun | 2 years ago Isn't the point of transformer training for it to learn to imitate the distribution of the training data? While concepts of "imitating the distribution" and "copying verbatim" are different, they are not too far off each other either. discuss order hn newest No comments yet.
No comments yet.