(no title)
float-trip | 2 years ago
The resulting model was so much worse than just formatting everything plaintext. This was with MPT-30B, 15 special tokens, 300M training tokens, and a full finetune.
I may have made a mistake, but I haven't seen any open source finetunes successfully add a large number of tokens yet either.
Tostino|2 years ago
Adding new tokens needs a ton of data to train what the token means. Reusing existing tokens, will allow you to easily teach that a sequence of tokens now has a new meaning after fine tuning.
float-trip|2 years ago
> Adding new tokens needs a ton of data to train what the token means.
But how much? 300M tokens is fine for a simple version of ChatML with ~4 tokens. Not for 15, at least in my case. How's this relationship scale?
Just trying to offer one datapoint for what doesn't work, with the hedge that I might have just had a bug
tayo42|2 years ago
a simple input might be <cards you hold> 1 14 56</end><cards to pick> 5 64 2</end> -> predicted token is the draft pick.
Then train a transformer based network from scratch.