top | item 41011623

(no title)

Encoder in the T5 sense doesn't produce a fixed vector, it produces one encoded vector for every step of input and all of that is given to the decoder.

The only difference between encoder/decoder and decoder-only is masking:

In an encoder, none of the tokens are masked at any step, and are all visible in both directions to the encoder. Each output of the encoder can attend to any input of the encoder.

In the decoder, the tokens are masked causally - each N+1 token can only attend to the previous N tokens.

discuss

No comments yet.