top | item 38712269

(no title)

epaulson | 2 years ago

This is a dumb question but how hard is it to train the mamba models that are on huggingface? It looks like the largest one is 2.8b - how many GPUs for how long do you need to train that up using a dataset like The Pile?

discuss

MacsHeadroom|2 years ago

That's a great question and I would like to know too. It looks like the answer is substantially faster than an equally sized Transformer, and the end result will score better than a Transformer on basically every benchmark. Also it will do inference 3-5x faster in half the RAM.