top | item 44715866 (no title) mathis | 7 months ago This might be more pure, but there is nothing to be gained. On the contrary, this would lead to very long sequences for which self-attention scales poorly. discuss order hn newest No comments yet.
No comments yet.