Retentive Network: A Successor to Transformer for Large Language Models (arxiv.org) 3 pts| 2 years ago | discuss