top | item 40295734

(no title)

korbip | 1 year ago

Thank you! I can say that it is not really a diminishing factor at the scales reported in the paper. So, xLSTM[7:1] is pretty much on par with xLSTM[1:0] in speed. We show that it is helpful on toy tasks, and it shows even better sequence extrapolation performance, so yes.

discuss

order

No comments yet.