It was just a tweak to emoji characters to mark them all as East Asian Full Width instead of Narrow or Ambiguous so that they displayed correctly when using a fixed width font in a terminal console. This probably only matters if you like to use emoji filenames (you mad person), but it felt like a wart so I reported it & had a short back and forth with the chair of the emoji-related subcommittee which resulted in a proposal which was eventually accepted by the committee into Unicode 9.0. The committee were great: took my tiny bug report seriously, wrote huge long treatises to justify the change & eventually voted it into the standard.
(This was pretty much my peak geek achievement of 2016 so far :) )
Holy crap I appreciate this change! I thought they'd never fix it because of compatibility. Thanks for the effort you put in.
It's not that I use emoji filenames, it's that I deal with real-world natural language text all the time, including at the console.
(In terms of compatibility, my text-justifying function is going to stop working correctly for the period of time between when gnome-terminal updates to Unicode 9 and when Python 3.x does. Still worth it.)
The success of the unicodepowersymbol proposal inspired me to suggest a couple characters to Unicode (the Bitcoin sign and IBM's group mark from 1960s mainframes, which were accepted). The point is that Unicode really is open to proposals from random people; you don't need to part of a big company to influence Unicode.
Although I had never worked with the Unicode Consortium, I [submitted a proposal][1] for an international symbol for an observer and it was eventually accepted.
You don't even need to be part of a big company to go to their meetings. I went to their conference in San Jose just to see the proposal for the Chinese Take-out box/chopsticks/fortune cook emojis. I met a ton of nice people, too.
How did the Unicode Consortium turn around. I remember 10 years ago they were refusing to add standard media icons because
>The scope of the Unicode Standard (and ISO/IEC 10646) does
not extend to encoding every symbol or sign that bears
meaning in the world.
>This list has been round and round and round on this -- regular as clockwork, about once a year, the topic comes up again. And I see no indication that the UTC or WG2 are any closer to concluding that bunches of icons should start being included in the character encoding standards simply on the basis of their being widespread and recognizable icons.
>Where is the defensible line between "Fast Forward" and
"Women's Restroom" or "Right Lane Merge Ahead" or
"Danger Crocodiles No Swimming"?
Unicode is supposed to include symbols that appear in "running text", not standalone icons. So no on traffic signs for instance. (There are exceptions for historical reasons. And emoji are a totally separate story.)
As the story mentions regarding the off symbol (a circle), there are many visually identical code points that have different semantic meanings. But in this case, they added an additional semantic meaning to an existing code point.
So which is it? Does each code point represent a visual image? A semantic meaning? Both? It depends? Something else?
I've tried to decipher that on my own and only learned that the answer to these sorts of questions are complicated, because it's very complicated to represent all written human language via one set of rules.
So I know some of the answers to my questions above, but I'm hoping someone with real expertise can provide the fundamental rules/policies - if there are any.
I'm a bit confused about Unicode. It was a repository of linguistic symbols, not raw symbols. More and more it looks like wingdings. Isn't this putting burden on font support and Text processing (what's the lexicographic order of such symbols, using the abstract name ?) ?
They want every symbol used in a document to have a unique encoding, so that you can change fonts without losing meaning. Fonts like wingdings are a horrible hack.
The idea is one (complex) encoding that will represent the info until the end of time. It creates a lot of trouble, but it's still a good idea.
In this case, the codepoints were added in part because the proposers could show many printed works (user manuals, I guess) that included sentences such as "to turn the foobar on, press the ■ button", which shows that the glyph between "the" and "button" is in some way like the surrounding glyphs. Chessmen were added for similar reasons, even though very few people actually read either user manuals or chess literature.
The difference between an icon and a letter is small and unclear. & is a symbol but was considered a letter as an example. Chinese characters are words etc.
We may think that we are enlightened beings but the fact is that pictures comprise a lot of how we communicate now and in the past. Are emojis that different from hieroglyphics?
Last I checked, Unicode don't actually have anything like coverage of the entirety of every script and alphabet. On the other hand, approving emoji and random icons delights Westerners.
But why? The trend towards putting icons into Unicode may be a mistake. Unless it's a symbol one uses in a sentence, there's no real reason to have it in Unicode. Unicode should not be viewed as a standard clip art library.
I already ranted about unicode earlier today, my main argument is, that unicode is what happens if everybody qualified thinks: "That's a great idea, of course you have to handle X and Y and Z and I just remember that I forgot to fill out several warranty cards."
This blog post is a nice example, I have absolutely no idea how these new code points are supposed to look like, since I only spend an afternoon to implement the unicode best practices from the Arch wiki, instead of subscribing to some unicode standard mailing list. (Except the one symbol which was redefined to a symbol that does not carry the semantic meaning of "standby symbol" anywhere outside of the unicode standard.)
In my opinion there are two ways forward, one burn the entire thing. Or alternatively, force the unicode committee to produce an authoritative and complete font, in triplicate, and in their own blood.
The Unicode tables include examples for all graphical code points: http://unicode.org/charts/. If you really wanted you can make them into a font (most of them seem to be vectorized), but since I'm guessing you see most of the added code points as useless why do you care if they show up as boxes? What harm is this stuff causing or going to cause to the standard? We have hundreds of thousands of unassigned code points.
Meanwhile, a lot of the "Ys and Zs" added to Unicode have proved to be extremely useful. Unicode's math operator and letter-styling support is what made MathJax (and more generally MathML) possible. They've also helped big time when it comes to accessibility (e.g. screen readers) for mathematics on the internet. Should we have shunted that off to another standard and made the creators of screen readers completely restructure their offerings so they can deal with Unicode characters and "Mathicode" characters? Assuming anyone bothered to implement it, how would that be better than just adding a Unicode category and spending a meager amount of space?
I think that the BMP -- Basic Multilingual Plane or the first 16-bit of Unicode characters -- is pretty reasonable, and covers fairly well everything we may consider as text (all alphabets in current use plus mathematical symbols). Anything beyond that, from emojis and pictograms to ancient Greek musical notation is pretty... weird.
I think it would have made much more sense to have something like image tags: a special codepoint would introduce a link to a URL containing a sequence of glyphs, followed by an index into that sequence. Those glyphs would be guaranteed not to change (in any meaningful way), and devices would be free to cache them. This way, anything that isn't real text, would standardize representation, too, instead of just a vague "meaning". Another standard could relate those glyphs to one another in some way, giving them standard semantics and means of translation (i.e. "Egyptian hieroglyphics"). This would also allow each of those (emojis or hieroglyphics) to evolve their standards independent of a single universal standard that means little.
Well, if you want to know what they can look like, the blogpost has images, an embedded webfont and links to the reference font for the new symbols. And AFAIK providing reference images that are freely usable is required for all new symbol proposals.
Legitimate question: Why is Unicode littered with all those useless symbols?
I can see the reasoning behind the standard (or very common) symbols or things like emoji, but having every possible glyph in UTF8 seems like a horrible waste.
What if we want to add new glyphs in the next 10 years for emerging standards?
having every possible glyph in UTF8 seems like a horrible waste
A horrible waste of what? Unicode 9.0 encodes 128,172 characters, of a possible total 1,112,064 code points. The addressable space is 11.52% full. Clearly there's enough left to keep adding more and more characters for a really long time.
If your complaint is that it's a waste of resources, time, etc - surely it's up to the people who are members of the consortium to decide how they want to spend their energy?
At some point, someone realizes that there is need to standardize fixed practical subset of Unicode that contains all essential symbols over the world so that all devices that comply with the standard can __actually__ interchange text in readable, printable and visually presentable form.
It's nice to have catalogue of symbols and tight encoding for them, but full support of Unicode encoding has very little to do with support for Unicode in an application.
The only problem I see is OSX/iOS, Windows, and Android don't ship with some universal, but shitty, font that has every single last glyph ever, always immediately updated to the new Unicode standard.
You mean 0x23E9 to 0x23FA, just before these new power symbols? I only noticed them because the unicode power symbol site has an image of what comes before their symbols.
Unicode symbols... seems like we should've developed them the way languages develop: start with the most important symbols, ones for food, water, shelter, danger, etc, then expanded them into the abstract mess they are today.
Emoji were not developed haphazardly. They evolved naturally in Japan, then were adopted by the rest of the world. That is why there are so many Asia / Japan themes in the standard emoji set. The problem is Westerners don't understand the Japanese emotion behind the symbol. The symbol for bookbag looks exactly like a Japanese school kid's backpack. It's why there is a kimono. Bamboo wind chimes. Tsunami. Shinkansen... I could go on and on.
In some respect, they are getting jumbled up because of international pressures for the base emoji set to be stretched into a be-all for the global market. An example is Taco. There are tacos in Japan. They are hard to find and when you do find one, you definitely don't want to eat one there. Mexican food is one of the rare cuisines the Japanese don't do better.
I hope that ligatures will be more popularized than using characters like "½", because it is very difficult to find them in text with standard ASCII characters, i.e. in Firefox by typing 1/2 in quick find (ctrl+f).
I was actually wondering about the electrical symbols for logic gates, such as AND, OR, NOR, XOR, NOT, etc. I would hope they were universally accepted by now and would help when writing books or describing logic. A quick Duck Duck search revealed nothing...?
I was wondering why they would have snowmen in the language. And then it occurred to me that maybe, since the unicode set has so much room for characters, that they were planning to allow cross-language communication through emoticons.
Think about it, if you can represent anything human with emoticons. Then you can communicate through emoticons only! Maybe that's what the ancient Egyptians were hopping for?
[+] [-] pja|9 years ago|reply
It was just a tweak to emoji characters to mark them all as East Asian Full Width instead of Narrow or Ambiguous so that they displayed correctly when using a fixed width font in a terminal console. This probably only matters if you like to use emoji filenames (you mad person), but it felt like a wart so I reported it & had a short back and forth with the chair of the emoji-related subcommittee which resulted in a proposal which was eventually accepted by the committee into Unicode 9.0. The committee were great: took my tiny bug report seriously, wrote huge long treatises to justify the change & eventually voted it into the standard.
(This was pretty much my peak geek achievement of 2016 so far :) )
[+] [-] rspeer|9 years ago|reply
It's not that I use emoji filenames, it's that I deal with real-world natural language text all the time, including at the console.
(In terms of compatibility, my text-justifying function is going to stop working correctly for the period of time between when gnome-terminal updates to Unicode 9 and when Python 3.x does. Still worth it.)
[+] [-] micro2588|9 years ago|reply
[+] [-] Sharlin|9 years ago|reply
[+] [-] yuhong|9 years ago|reply
[+] [-] kens|9 years ago|reply
[+] [-] hypertexthero|9 years ago|reply
Although I had never worked with the Unicode Consortium, I [submitted a proposal][1] for an international symbol for an observer and it was eventually accepted.
[1]: http://hypertexthero.com/logbook/2015/01/international-symbo...
[+] [-] MichaelGG|9 years ago|reply
(Top result: http://www.cbc.ca/news/trending/rifle-emoji-dropped-unicode-...)
[+] [-] rbanffy|9 years ago|reply
http://imgur.com/Sx0lkM8
[+] [-] PhasmaFelis|9 years ago|reply
[+] [-] kristianp|9 years ago|reply
[1] http://www.unicode.org/charts/PDF/U2B00.pdf
[+] [-] intopieces|9 years ago|reply
[+] [-] paxcoder|9 years ago|reply
[+] [-] tangus|9 years ago|reply
>The scope of the Unicode Standard (and ISO/IEC 10646) does not extend to encoding every symbol or sign that bears meaning in the world.
>This list has been round and round and round on this -- regular as clockwork, about once a year, the topic comes up again. And I see no indication that the UTC or WG2 are any closer to concluding that bunches of icons should start being included in the character encoding standards simply on the basis of their being widespread and recognizable icons.
>Where is the defensible line between "Fast Forward" and "Women's Restroom" or "Right Lane Merge Ahead" or "Danger Crocodiles No Swimming"?
(http://www.unicode.org/mail-arch/unicode-ml/y2005-m08/0371.h...)
Now it looks they add whatever somebody thinks of. I guess it's related to the liberation from the BMP.
[+] [-] shakethemonkey|9 years ago|reply
Until Unicode has a half-star character, it won't even be able to encode the average newspaper.
[+] [-] kens|9 years ago|reply
[+] [-] acz|9 years ago|reply
[+] [-] hackuser|9 years ago|reply
So which is it? Does each code point represent a visual image? A semantic meaning? Both? It depends? Something else?
I've tried to decipher that on my own and only learned that the answer to these sorts of questions are complicated, because it's very complicated to represent all written human language via one set of rules.
So I know some of the answers to my questions above, but I'm hoping someone with real expertise can provide the fundamental rules/policies - if there are any.
[+] [-] peterburkimsher|9 years ago|reply
If people actually used these, it would make searching text for formulae much easier. Wikipedia editors and academic publishers, please note.
Also, there's no Unicode for screwdriver. Perhaps iFixit would like to campaign for that?
Congratulations on getting the power symbols in! When @edent writes "Will update ... when I stop dancing", was it "I got the power"?
[+] [-] agumonkey|9 years ago|reply
[+] [-] wmil|9 years ago|reply
The idea is one (complex) encoding that will represent the info until the end of time. It creates a lot of trouble, but it's still a good idea.
[+] [-] Arnt|9 years ago|reply
In this case, the codepoints were added in part because the proposers could show many printed works (user manuals, I guess) that included sentences such as "to turn the foobar on, press the ■ button", which shows that the glyph between "the" and "button" is in some way like the surrounding glyphs. Chessmen were added for similar reasons, even though very few people actually read either user manuals or chess literature.
[+] [-] the_mitsuhiko|9 years ago|reply
[+] [-] gkya|9 years ago|reply
[+] [-] cheez|9 years ago|reply
[+] [-] mcv|9 years ago|reply
[+] [-] Semiapies|9 years ago|reply
[+] [-] Animats|9 years ago|reply
[+] [-] yk|9 years ago|reply
This blog post is a nice example, I have absolutely no idea how these new code points are supposed to look like, since I only spend an afternoon to implement the unicode best practices from the Arch wiki, instead of subscribing to some unicode standard mailing list. (Except the one symbol which was redefined to a symbol that does not carry the semantic meaning of "standby symbol" anywhere outside of the unicode standard.)
In my opinion there are two ways forward, one burn the entire thing. Or alternatively, force the unicode committee to produce an authoritative and complete font, in triplicate, and in their own blood.
[+] [-] johncolanduoni|9 years ago|reply
Meanwhile, a lot of the "Ys and Zs" added to Unicode have proved to be extremely useful. Unicode's math operator and letter-styling support is what made MathJax (and more generally MathML) possible. They've also helped big time when it comes to accessibility (e.g. screen readers) for mathematics on the internet. Should we have shunted that off to another standard and made the creators of screen readers completely restructure their offerings so they can deal with Unicode characters and "Mathicode" characters? Assuming anyone bothered to implement it, how would that be better than just adding a Unicode category and spending a meager amount of space?
[+] [-] pron|9 years ago|reply
I think it would have made much more sense to have something like image tags: a special codepoint would introduce a link to a URL containing a sequence of glyphs, followed by an index into that sequence. Those glyphs would be guaranteed not to change (in any meaningful way), and devices would be free to cache them. This way, anything that isn't real text, would standardize representation, too, instead of just a vague "meaning". Another standard could relate those glyphs to one another in some way, giving them standard semantics and means of translation (i.e. "Egyptian hieroglyphics"). This would also allow each of those (emojis or hieroglyphics) to evolve their standards independent of a single universal standard that means little.
[+] [-] detaro|9 years ago|reply
[+] [-] wooptoo|9 years ago|reply
I can see the reasoning behind the standard (or very common) symbols or things like emoji, but having every possible glyph in UTF8 seems like a horrible waste.
What if we want to add new glyphs in the next 10 years for emerging standards?
[+] [-] onion2k|9 years ago|reply
A horrible waste of what? Unicode 9.0 encodes 128,172 characters, of a possible total 1,112,064 code points. The addressable space is 11.52% full. Clearly there's enough left to keep adding more and more characters for a really long time.
If your complaint is that it's a waste of resources, time, etc - surely it's up to the people who are members of the consortium to decide how they want to spend their energy?
[+] [-] kbart|9 years ago|reply
[+] [-] nabla9|9 years ago|reply
It's nice to have catalogue of symbols and tight encoding for them, but full support of Unicode encoding has very little to do with support for Unicode in an application.
🇦 🇧 🇨 🇩 🇪 🇫 🇬 🇭 🇮 🇯 🇰 🇱 🇲 🇳 🇴 🇵 🇶 🇷 🇸 🇹 🇺 🇻 🇼 🇽 🇾 🇿.
[+] [-] kozak|9 years ago|reply
[+] [-] c3t0|9 years ago|reply
Thank you for stepping up and making a difference.
[+] [-] edent|9 years ago|reply
[+] [-] DiabloD3|9 years ago|reply
[+] [-] VMG|9 years ago|reply
[+] [-] nomercy400|9 years ago|reply
[+] [-] dclowd9901|9 years ago|reply
[+] [-] rabboRubble|9 years ago|reply
In some respect, they are getting jumbled up because of international pressures for the base emoji set to be stretched into a be-all for the global market. An example is Taco. There are tacos in Japan. They are hard to find and when you do find one, you definitely don't want to eat one there. Mexican food is one of the rare cuisines the Japanese don't do better.
[+] [-] piotrkubisa|9 years ago|reply
[+] [-] systemfreund|9 years ago|reply
> Important symbol additions include:
> 19 symbols for the new 4K TV standard
I am wondering, why did they add symbols for a standard which will become obsolete eventually?
[0]: http://unicode.org/versions/Unicode9.0.0/
[+] [-] bArray|9 years ago|reply
I was actually wondering about the electrical symbols for logic gates, such as AND, OR, NOR, XOR, NOT, etc. I would hope they were universally accepted by now and would help when writing books or describing logic. A quick Duck Duck search revealed nothing...?
[+] [-] baby|9 years ago|reply
Think about it, if you can represent anything human with emoticons. Then you can communicate through emoticons only! Maybe that's what the ancient Egyptians were hopping for?
[+] [-] jgalt212|9 years ago|reply
https://pypi.python.org/pypi/defusedxml
[+] [-] singularity2001|9 years ago|reply
﷽ 65021 ﷽ FDFD ARABIC LIGATURE BISMILLAH AR-RAHMAN AR-RAHEEM
http://graphemica.com/%EF%B7%BD