(no title)
statusfailed | 2 years ago
A couple more bits of feedback:
(1) The "suggestion" / "I'm unsure" etc. feedback is fantastic
(2) Word segmentation doesn't seem to be working properly, and so the context lookup doesn't work right. Example:
中国 should be parsed as a single word ("china"), but it's parsed as individual characters ("middle", "kingdom").
This means I have to tab out to a dictionary to look up words, and it's a bit annoying to select the right text.
Hadjimina|2 years ago
Not sure if you saw it, but we already have pinyin in there. If you open up the settings and tick "show pronunciations" they will appear above the chat messages.
yorwba|2 years ago
Or find all substrings that are listed in a dictionary (≈everyone uses cc-cedict https://www.mdbg.net/chinese/dictionary?page=cc-cedict ) and give translations for all of them. That way, the user won't be limited to any particular chunking granularity, which is always a finicky aspect of word segmenters to fine-tune.
statusfailed|2 years ago
The "show pronounciations" setting just turns on pinyin above characters - what I want is to type pinyin and enter chinese characters. Actually showing the pinyin above characters is quite distracting!
[0]: https://pypi.org/project/jieba/