top | item 36219305

(no title)

eMSF | 2 years ago

While writing a fancy word counter I learnt that glibc iswspace (or the glibc locale data) actually does not consider non-breaking spaces as, well, spaces even when using a Unicode locale. This apparently conforms to ISO 30112. (For example MSVCRT does do so.)

I happened to notice this via a result mismatch as GNU wc does count NBSPs as word separators. Even though it uses iswspace, it also additionally checks for a hard coded set of Unicode non-breaking spaces.

(I have to say I'm a bit surprised at being at getting voted hidden here. I thought this was mostly related to the topic at hand. I would of course gladly be corrected if mistaken about the details.)

discuss

order

No comments yet.