top | item 40264086 (no title) leodriesch | 1 year ago The model is always wrong, since it predicts a propability distribution over all possible tokens, but the target has 100% possibility for one token and 0 for all others. discuss order hn newest No comments yet.
No comments yet.