(no title)
korbip
|
1 year ago
Thank you! I can say that it is not really a diminishing factor at the scales reported in the paper. So, xLSTM[7:1] is pretty much on par with xLSTM[1:0] in speed.
We show that it is helpful on toy tasks, and it shows even better sequence extrapolation performance, so yes.
No comments yet.