top | item 46895660 (no title) korbip | 25 days ago This was done already here as well: https://arxiv.org/abs/2507.04239 discuss order hn newest cubefox|25 days ago Sounds interesting, but...> these models dominate both exponential attention and linear attention at long-context trainingThere is no exponential attention; standard attention is quadratic. Strange mistake.
cubefox|25 days ago Sounds interesting, but...> these models dominate both exponential attention and linear attention at long-context trainingThere is no exponential attention; standard attention is quadratic. Strange mistake.
cubefox|25 days ago
> these models dominate both exponential attention and linear attention at long-context training
There is no exponential attention; standard attention is quadratic. Strange mistake.