top | item 24843046

(no title)

br1 | 5 years ago

And UCS-2 run out of bits because the Unicode consortium keeps adding garbage like emojis to keep their job...

discuss

As well as this being rather closed-minded, it's also not true. The contents of the 0000-FFFE codepoints are public knowledge, and the biggest users of space are:

  1. the private use area
  2. the general "CJK" area

The second of which has a truly mind-boggling number of characters, including every possible composite Hangul glyph used in modern Korean, despite them being constructable from the basic Hangul codepoints.

Emojis and other symbols which aren't used for language appear relatively rarely. Certainly there is no reason to believe that UCS-2 would be sufficient for writing if they were removed. The number of scripts included in Unicode would exhaust even the private use area, and UTF-16 would have been invented regardless.

lifthrasiir|5 years ago

> [...] despite them being constructable from the basic Hangul codepoints.

Unicode strives for the round-trip compatibility with source character sets, and in this case KS X 1001 (KS C 5601 at that time) is a main culprit: it had 2,350 (out of 11,172) common syllables precomposed. But it happens that Korea had supplementary character sets beyond KS X 1001, which were subsequently added to Unicode 1.1 (up to some 6,000 characters), before it was decided that having an algorithmically derived section of all 11,172 syllables is better. This whole situation is now known as the "Hangul mess".

throwwwwwaway|5 years ago

>The second of which has a truly mind-boggling number of characters, including every possible composite Hangul glyph used in modern Korean, despite them being constructable from the basic Hangul codepoints.

Also true of most Chinese characters, but the proposal to encode them component-wise was a no-go (for adoption in China IRRC) and separate character encodings was went with in the end. I never managed to dig up the reasons behind it.

mhh__|5 years ago

Although I think we have enough, I think the way people use emojis cements my view that they are a good thing.

Nothing trivial annoys me more than people writing in "I luv u" shorthand, so if an emoji can a more emotional message in less characters I'm all for it. Even if it's a thinly veiled sexual euphemism.

Emojis in official corporate communication can burn - I got one recently when applying for a relatively serious job: sends a strange message, that and it reminds me that I want to save the emojis for my friends and family (sadly not the people we most of spend our time with in)

johnisgood|5 years ago

Yeah, but then you should consider the effort taken to say "I love you" in that way. Such a low effort. The message of sending a heart is typically habitual for people, and has no real meaning behind it. Same could be said with "I luv u", but a bit less so, I would say. I think it has a bit more weight to it.

jart|5 years ago

Come on there's tons of good stuff in the astral planes like 𐌾𐍈𐍊𐌷𐌹𐌴, runes, full metal alchemy, egyptian, cuneiform (which has had a lot of impact in the past helping with those hefty Go hello world binaries), and 𝐁𝚹𝙇𝗗 math that doesn't need a <b> tag. See https://justine.storage.googleapis.com/astralplanes.txt

saagarjha|5 years ago

Congrats, you've overloaded CoreText on my machine. Safari refused to load that page for me and running it through less made Terminal hang a lot.

Also, fun fact that I just learned: CoreText synchronously calls through to a font registry in fontd using XPC to draw text on the main thread in your app.

acqq|5 years ago

Fascinating, I see for the first time how many of those are present in iOS. I’ll have to check them all in other platforms.

poizan42|5 years ago

emojis were added in Unicode 6.0 in 2010. Surrogate pairs were introduced with Unicode 2.0 in 1996. It should be pretty clear from that timeline that emojis had nothing to do with it. Unicode as of today contains 92,856 CJK Unified Ideographs, so just by that alone UCS-2 was insufficient.

shakna|5 years ago

As far as I'm aware, the largest multibyte in Unicode isn't even an emoji or some odd symbol, it's theta 𐍈. There's a lot more complexity for things that you can actually expect people to make use of.

cdmckay|5 years ago

Why are emojis garbage?

gurkendoktor|5 years ago

They required every graphics stack to add support for color fonts. And because new emoji compositions are invented every year, it's common to see uncomposed glitches like <facepalming woman><male gender marker> or <thumbs up><white skin color marker> on all sorts of semi-smart devices.

In my opinion Unicode went from a slow-moving standard that carefully absorbed the world's languages, to a pop culture product that serves to make old software obsolete even faster.