top | item 45673320

(no title)

dxtrous | 4 months ago

From the authors: great question. If you take an "easy" task for long-range dependencies where a Mamba-like architecture flies (and the transformer doesn't, or gets messy), the hatchling should also be made to fly. For more ambitious benchmarks, give it a try in a place you care about. The paper is really vanilla and focused on explaining what's happening inside the model, but should be good enough as a starting point for architecture tweaks and experiments.

discuss

order

No comments yet.