top | item 42574424 (no title) yshui | 1 year ago Any autoregressive model can do what you are describing. transformers are generating one token at a time too, not all at once. discuss order hn newest intalentive|1 year ago True but memory requirements grow with sequence length. For recurrent models the memory requirement is constant. This is why I qualified with "low memory". whimsicalism|1 year ago yes but transformers are much slower than state space models
intalentive|1 year ago True but memory requirements grow with sequence length. For recurrent models the memory requirement is constant. This is why I qualified with "low memory".
intalentive|1 year ago
whimsicalism|1 year ago