top | item 44576361

(no title)

lukebechtel | 7 months ago

It's meant to replace the BPE tokenizer piece, so it isn't a full Language Model by itself.

In fact in Gu's blog post (linked in a post below) it's mentioned that they created a Mamba model that used this in place of the tokenizer.

discuss

order

No comments yet.