WingNews logo WingNews
top | new | best | ask | show | jobs
top | item 39356095

(no title)

Translationaut | 2 years ago

This seems only to work cause large GPTs have redundant, undercomplex attentions. See this issue in BertViz about attention in Llama: https://github.com/jessevig/bertviz/issues/128

discuss

order

No comments yet.

powered by hn/api // news.ycombinator.com