top | item 32007086

(no title)

onurcel | 3 years ago

This is one of the examples we keep in mind and that's also why we can't 100% trust public dataset labels. This motivated us to train a Language IDentification system for all the languages we wanted to handle in order to build the monolingual dataset. More details in the paper ;) Or here, if you have questions

discuss

No comments yet.