The Opus OpenSubtitles corpus was very useful when I was creating this Chinese-English dictionary app: https://github.com/ReubenBond/HanBaoBao. The tool which creates the dictionary database aggregates several sources, including processing Chinese subtitles for word frequency to inform the most likely cuts when performing word segmentation.
No comments yet.