top | item 40107167

(no title)

snnn | 1 year ago

Man, if English is the only human language in this world, who would need UTF-8? The other encodings exist because they are more efficient for the other languages. Especially, for the Chinese, Japanese, and Korean languages. UTF-8 takes 50% more space than the alternatives. To bad modern Linux systems only support UTF-8 locales.

discuss

Karellen|1 year ago

> To bad modern Linux systems only support UTF-8 locales.

Do they? On my system:

    $ grep _ /etc/locale.gen | grep -v UTF-8 | wc -l
    183

That's 183 non-UTF-8 locales that are available on my system. OK, I don't have any non-UTF-8 locales currently configured for use, but I don't have to install anything extra for them to be available. Just uncomment some configuration lines and re-run `locale-gen`.

https://manpages.debian.org/bookworm/locales/locale-gen.8.en...

snnn|1 year ago

But the reality is: most glibc functions like `dirname` could not handle non UTF-8 encodings, because some encodings (like GBK) have overlaps with ASCII, which means when you search an ASCII char(like '\') in a char array, you may accidentally hit a half of a non-English character. Therefore, people in Asia usually do not use the non UTF-8 locales.

loeg|1 year ago

The other encodings mostly exist for historical reasons; efficiency is just not a huge factor in 2024.