top | item 38320943 (no title) howrar | 2 years ago Every token is already being generated with all previously generated tokens as inputs. There's nothing about the architecture that makes this hard. It just hasn't been trained on this kind of task. discuss order hn newest peyton|2 years ago Really? I don’t know of a positional encoding scheme that’ll handle this.
peyton|2 years ago