top | item 42981203

(no title)

rmccrear | 1 year ago

Great project! It's fascinating how hard segmentation is and how many approaches there are. I thought I'd mention a trick that can let you segment without a backend. When you double click Chinese text in the browser, it will highlight an entire word. For example, try double clicking on the text here: 一步登天:走一步就到天堂美好境地。 It highlights/segments the first 4 characters as a chengyu, and the others as one or two character words. I haven't been able to discover what method Apple and Microsoft use to segment, but it seems to do a good job. You can even use JavaScript's Range.expand() function to do this programmatically. I once even made a little JS library that can run in the background and segment words on a page.

discuss

order

imron|1 year ago

That’s neat!