(no title)
ambrozk | 1 year ago
Decoder: Fixed representation vector + N decoded text tokens -> N+1th text token
Encoder/Decoder architecture: You take some tokenized text, run an encoder on it to get a fixed representation vector, and then recursively apply the decoder to your fixed representation vector and the 0...N tokens you've already produced to produce the N+1th token.
Decoder-only architecture: You take some tokenized text, and recursively apply a decoder to the 0...N tokens you've already produced to produce the N+1th token (without ever using an encoded representation vector).
Basically, an encoder produces this intermediate output which a decoder knows how to combine with some existing output to create more output (imagine, e.g., encoding a sentence in French, and then feeding a decoder the vector representation of that sentence plus the three words you've translated so far, so that it can figure out the next word in the translation). A decoder can be made to require an intermediate context vector, or (this is how it's done in decoder-only architectures) it can be made to require only the text produced so far.
opprobium|1 year ago
The only difference between encoder/decoder and decoder-only is masking:
In an encoder, none of the tokens are masked at any step, and are all visible in both directions to the encoder. Each output of the encoder can attend to any input of the encoder.
In the decoder, the tokens are masked causally - each N+1 token can only attend to the previous N tokens.