top | item 35591867

(no title)

jaidhyani | 2 years ago

This is true in general but not in the use case they presented. If they had explained why a normalized distribution is useful it would have made sense - but they just describe this as pick-the-top-answer next-word predictor, which makes the softmax superfluous.

discuss

order

No comments yet.