(no title)
craigacp | 1 year ago
Fundamentally it's basically a difference between bidirectional attention in the encoder and a triangular (or "causal") attention mask in the decoder.
craigacp | 1 year ago
Fundamentally it's basically a difference between bidirectional attention in the encoder and a triangular (or "causal") attention mask in the decoder.
No comments yet.