top | item 37752276

(no title)

dystroy | 2 years ago

But not all normalizations are done to fight spam, not all of them should be interested in visual similarity.

I normalize strings in searches not because of bad intents but because for all user related purposes "Comunicações" and "Comunicações" are the same, their different encodings being more of an accident.

discuss

order

ssokolow|2 years ago

*nod* ...and stemming is that taken to a greater extreme.

I was just pointing out that Unicode itself has various forms of normalization and normalization-adjacent functionality that people are far too unaware of.