top | item 40588712

(no title)

trextrex | 1 year ago

I'm not clear on what advantage this architecture has over mamba/Griffin. They also have the linear scaling, better sequence parallelism and are competitive in performance with transformers.

discuss

order

lalaland1125|1 year ago

The whole field seems to be having issues with comparisons right now.

We really don't even know how Mamba vs Griffin compare.

wave_1|1 year ago

state tracking...