top | item 39483873 (no title) lpasselin | 2 years ago The mamba paper shows significant improvements in all model sizes, up to 1b, the largest one tested.Are there any reason why it wouldn't scale to 7b or more? Have they tried it? discuss order hn newest samus|2 years ago That's the issue - I keep hearing that it is beyond small research group's budget to meaningfully train such a large model. You don't just need GPU time, you also need data. And just using the dregs of the internet doesn't cut it.
samus|2 years ago That's the issue - I keep hearing that it is beyond small research group's budget to meaningfully train such a large model. You don't just need GPU time, you also need data. And just using the dregs of the internet doesn't cut it.
samus|2 years ago