top | item 40334925

(no title)

Why is that an issue? Training the tokenizer seems much more straightforward than training the model as it is based on the statistics of the input data. I guess it may take a while for massive datasets, but is calculating the frequencies impossible to be done on a bigger scale?

discuss

No comments yet.