(no title)
2shortplanks | 1 year ago
Dr Drang's script counts the number of _characters_ not the number of _glyphs_. This matters because there's more than one way to represent é: Either just as unicode character \x{e9} ("NFC") or as a combination of "e" and the combining character that adds the accent ("NFD")
For example for "léon" this prints out "l3n" for me.
What you need to do is normalize to NFC.
> /usr/bin/perl -C -MUnicode::Normalize -pe '$_=NFC($_);s/(.)(.+)(.)/$1 . length($2) . $3/e'
wizzwizz4|1 year ago