top | item 40335224

(no title)

65a | 1 year ago

I find it hard to believe that a Canadian company's model contained an undertrained token related to hockey (albeit in German). In all seriousness, this is pretty cool and am excited to see understanding of tokenization impacts on models improve. One notable finding is that a lot of the earlier open source models have issues with carriage returns, which are not that uncommonly introduced depending on where the data is coming from etc.

discuss

order

No comments yet.