top | item 39356095 (no title) Translationaut | 2 years ago This seems only to work cause large GPTs have redundant, undercomplex attentions. See this issue in BertViz about attention in Llama: https://github.com/jessevig/bertviz/issues/128 discuss order hn newest No comments yet.
No comments yet.