top | item 41011224

(no title)

ambrozk | 1 year ago

Encoder: Text tokens -> Fixed representation vector

Decoder: Fixed representation vector + N decoded text tokens -> N+1th text token

Encoder/Decoder architecture: You take some tokenized text, run an encoder on it to get a fixed representation vector, and then recursively apply the decoder to your fixed representation vector and the 0...N tokens you've already produced to produce the N+1th token.

Decoder-only architecture: You take some tokenized text, and recursively apply a decoder to the 0...N tokens you've already produced to produce the N+1th token (without ever using an encoded representation vector).

Basically, an encoder produces this intermediate output which a decoder knows how to combine with some existing output to create more output (imagine, e.g., encoding a sentence in French, and then feeding a decoder the vector representation of that sentence plus the three words you've translated so far, so that it can figure out the next word in the translation). A decoder can be made to require an intermediate context vector, or (this is how it's done in decoder-only architectures) it can be made to require only the text produced so far.

discuss

opprobium|1 year ago

Encoder in the T5 sense doesn't produce a fixed vector, it produces one encoded vector for every step of input and all of that is given to the decoder.

The only difference between encoder/decoder and decoder-only is masking:

In an encoder, none of the tokens are masked at any step, and are all visible in both directions to the encoder. Each output of the encoder can attend to any input of the encoder.

In the decoder, the tokens are masked causally - each N+1 token can only attend to the previous N tokens.