There are many papers that use a recurrence across sub-sequences and attention within sub-sequences. Google did this with Infini-Attention and one of the variants from the Titans paper. However, I think the earliest example of this is Transformer-XL.
tripplyons|1 year ago
biofox|1 year ago
immibis|1 year ago