(no title)
opprobium | 1 year ago
The only difference between encoder/decoder and decoder-only is masking:
In an encoder, none of the tokens are masked at any step, and are all visible in both directions to the encoder. Each output of the encoder can attend to any input of the encoder.
In the decoder, the tokens are masked causally - each N+1 token can only attend to the previous N tokens.
No comments yet.