top | item 42574424

(no title)

yshui | 1 year ago

Any autoregressive model can do what you are describing. transformers are generating one token at a time too, not all at once.

discuss

order

intalentive|1 year ago

True but memory requirements grow with sequence length. For recurrent models the memory requirement is constant. This is why I qualified with "low memory".

whimsicalism|1 year ago

yes but transformers are much slower than state space models