(no title)
anabis
|
4 months ago
Yeah, mapping chinese characters to linear UTF-8 space is throwing a lot of information away. Each language brings some ideas for text processing. sentencepiece inventor is Japanese, which doesn't have explicit word delimiters, for example.
ComputerGuru|4 months ago