The rationale given for including mirrored half-stars as separate codepoints is right-to-left languages. I wondered why this was needed, since Unicode already has the a right-to-left mark (RLM)[1].
I found the answer in a comment on "Explain XKCD".[2] The RLM usually only reorders characters, but does not mirror their glyphs. The exception are glyphs with the "Bidi_Mirrored=Yes" property, which are mapped to a mirrored codepoint.[3]
The half-stars proposal includes a note on that property: "Existing stars are in the “Other Neutrals” class, so half stars should probably use the ON bidirectional class. The half stars have the obvious mirrored counterparts, so they can be Bidi mirrored. However, similar characters such as
LEFT HALF BLACK CIRCLE are not marked as mirrored. I'll leave it up to the Unicode experts to determine if Bidi Mirrored would be appropriate or not."
The one I'm surprised about is not the stars, but actually the bitcoin character. It's just a form of branding to me, and while I think there's interesting uses for blockchain technology, public interest seems to be a bit inflated. Plus that blockchain tech will likely outlive bitcoin itself.
It's not like there is some central Bitcoin company so what is the brand? Brands are generally owned by companies and are intellectual property in the eyes of governments.
It is great to see Unicode being able to encode almost every symbol people can think of, however I am still struggling to make them appear on my screen - is there a good font that has great coverage for unicode? Many times there are clever use of unicode yet I can only see empty rectangles.
Unicode does not dictate how glyphs are presented. It just describes and categorizes them.
So how they look comes from the font that is used. For the proposal these fonts probably didn't exist yet, so it was probably just a (slightly sloppy) photoshop.
We need to hold the line somewhere. Preferably before corporate logos get into Unicode. I've seen Facebook and Twitter icons as Unicode characters in the user-definable space. This currently requires a downloaded font, but there's probably some lobbyist somewhere trying to get them into Unicode.
It's getting really complicated. There are now skin-tone modifiers for emoji.
Unicode is turning into a few useful characters amid a sea of junk. This will continue as long as people acquire status by getting "their" symbol(s) into Unicode. I don't see any way this can change.
Unicode Technical Report #51, which is where Emoji are laid out, talks a bit about the current thinking of the committees on this:
> The longer-term goal for implementations should be to support embedded graphics, in addition to the emoji characters. Embedded graphics allow arbitrary emoji symbols, and are not dependent on additional Unicode encoding. Some examples of this are found in Skype and LINE—see the emoji press page for more examples.
> However, to be as effective and simple to use as emoji characters, a full solution requires significant infrastructure changes to allow simple, reliable input and transport of images (stickers) in texting, chat, mobile phones, email programs, virtual and mobile keyboards, and so on. (Even so, such images will never interchange in environments that only support plain text, such as email addresses.) Until that time, many implementations will need to use Unicode emoji instead
I simply cannot wrap my head around the direction of the Unicode discourse.
We're discussing the appropriate code-point for different smiley faces,
obscure electrical symbols[0] or, in the present case, half stars to express
film or book ratings, yet we have no complete set of sub- and superscripts!
Am I mistaken in thinking it odd, that there's a complete Klingon alphabet but no
representation whatsoever for most Greek or Latin subscripts? Or what if, heaven forbid,
I'd want to use a 'b' index/subscript? Tough! Not even the "phonetic extensions",
where subscript-i comes from, provides it.
Surely there's the one or two actual scientists on the Unicode consortium?
Or even the one odd soul still sporting a notion of consistency who finds it
only logical to provide a "subscript b" if there's a "subscript a"?
Unicode is not known for its consistency in dealing with these issues. The original idea behind Unicode was to be able to represent every then-extant character set with perfect fidelity (i.e., go from X to Unicode and back, and you should get the same data). Why are there letters like U+212B Angstrom sign (not to be confused with U+00C5 Latin capital A with ring above) or things like half-width and full-width characters? Because they were present in Shift-JIS, not because of any coherent notion of what constitutes a glyph. Han unification was driven more by the need to keep from blowing a space budget than by actual rationalization of whether or not the scripts deserved separate spaces.
Note that Klingon isn't in Unicode (it was explicitly rejected by the UTC, with a vote of 9 in favor of the rejection proposal, 0 against it, and 1 abstaining). Tengwar and Cirth, though, are actually considered serious proposals for Unicode, just really, really low priority compared to, say, Mayan script (for which the first proposal should be going live in 2017). Mayan script is interesting in its own right because it's the script (well, of the ones I'm aware of) that most challenges normal conventions on what constitutes letters and glyphs.
ISTM a great deal of trouble and complication could have been prevented by three special types of NBSP that meant "sub", "super", and "back to normal". It's true that some glyphs will be special-cased by some fonts, but in general the glyph is just shrunk and translated when sub- or super-scripted.
I have to disagree. All but 3 of those pictographs are already in the Unicode standard. You have to patch fonts because A) your preferred font may not have them and B) to make certain that the font meets Powerline's expectations.
The ones that are "unique" are a bit annoying because they replace defined characters in the Basic Multilingual Plane - Private Use section(E000-FFFF). Even though the section is "Private Use" it is often already defined by your OS's system font. There's the Supplemental Private Use Areas A (F0000-FFFFD) and B (100000-10FFFF) which can be overwritten safely.
I scare quote "unique" because two of those characters are full-height arrows; one right-pointing, the other left-pointing. These are already defined as u1F780 (🞀) u1F782 (🞂). It may be the case that some fonts that the triangles either A) don't actually go from floor to ceiling, or B) they have empty space behind their hypotenuse.
The only truly unique character is the "git branch" pictograph. Maybe, someone could write up a convincing argument to include it, but I can't imagine one. It's not a symbol you see to often even in the git community. And, I would bet if you looked hard enough, there's some mathematical symbol that would be suitable.
Just FYI, I've used powerline fonts daily for the past ~3 years.
That's great but what we really need (ahem- what I really need) is more maths-y characters, like ∑∏∫∀ and all the sub- and super- scripted letters: ⁱⁿₙᵢ and so on.
I can never find a lower-case Greek subscripted α or β when I need one...
> That's great but what we really need (ahem- what I really need) is more maths-y characters, like ∑∏∫∀ and all the sub- and super- scripted letters: ⁱⁿₙᵢ and so on.
Agreed, but what we need even more than the symbols is some ((La)TeXy, says the mathematician) way of combining them. For example (says the mathematician who doesn't understand the complexity of text encodings), why do we need a whole bunch of separate "subscript m", "subscript n", etc., glyphs, rather than just one "subscript" combining mark?
Unicode is a brilliant idea, but it went off the rails with combining characters, especially when there is both a code point for a character and a combining set of characters that semantically are the same thing.
How would you solve things without combining characters? Especially the case where you can have multiple diacritics on a letter. Encode every single combination of all of them? Seems a bit wasteful, don't you think?
Precomposed characters exist because they existed in other encodings previously and encoding such characters has been one of the core principles of Unicode to ensure an easy upgrade path. Heck, we inherited box drawing characters that way, which I think are more questionable than combining diacritics.
The other day I was searching for the words for bronze in Tibetan for research on possible etymologies of some Tibeto-Burman phonetic transliterations in to middle Chinese.[0] (As you do) Anyway, I found some low resolution entries in scanned dictionaries online without romanization, but was unable to translate these to codepoints to obtain a phonetic approximation, even after using online keyboards, due to the hassles of combining characters. I have studied a lot of abugidas (Tai/Lao/Khmer/etc.) so am not exactly coming at the problem from scratch, either. Also rather shocked that the Tibetan community hasn't managed to put a decent dictionary online yet.
[+] [-] 1wd|9 years ago|reply
I found the answer in a comment on "Explain XKCD".[2] The RLM usually only reorders characters, but does not mirror their glyphs. The exception are glyphs with the "Bidi_Mirrored=Yes" property, which are mapped to a mirrored codepoint.[3]
The half-stars proposal includes a note on that property: "Existing stars are in the “Other Neutrals” class, so half stars should probably use the ON bidirectional class. The half stars have the obvious mirrored counterparts, so they can be Bidi mirrored. However, similar characters such as LEFT HALF BLACK CIRCLE are not marked as mirrored. I'll leave it up to the Unicode experts to determine if Bidi Mirrored would be appropriate or not."
[1] https://en.wikipedia.org/wiki/Right-to-left_mark
[2] https://www.explainxkcd.com/wiki/index.php/1137:_RTL
[3] http://www.unicode.org/Public/UNIDATA/BidiMirroring.txt
[+] [-] syphilis2|9 years ago|reply
[+] [-] treve|9 years ago|reply
[+] [-] sqeaky|9 years ago|reply
[+] [-] justinpombrio|9 years ago|reply
[+] [-] nacc|9 years ago|reply
[+] [-] glitch|9 years ago|reply
[+] [-] mixmastamyk|9 years ago|reply
[+] [-] markbao|9 years ago|reply
HN strips the characters out from comments, but they're displayed in the beginning of the article.
[+] [-] treve|9 years ago|reply
So how they look comes from the font that is used. For the proposal these fonts probably didn't exist yet, so it was probably just a (slightly sloppy) photoshop.
[+] [-] doodpants|9 years ago|reply
[+] [-] edent|9 years ago|reply
If anyone wants to submit some new characters, all of our documents are on GitHub https://github.com/jloughry/Unicode
[+] [-] Animats|9 years ago|reply
It's getting really complicated. There are now skin-tone modifiers for emoji.
[+] [-] WalterBright|9 years ago|reply
[+] [-] ygra|9 years ago|reply
Skin tone modifiers work pretty much like diacritics already do. It's not complicated and most of the support relies on the font anyway.
[+] [-] amelius|9 years ago|reply
[+] [-] wxs|9 years ago|reply
> The longer-term goal for implementations should be to support embedded graphics, in addition to the emoji characters. Embedded graphics allow arbitrary emoji symbols, and are not dependent on additional Unicode encoding. Some examples of this are found in Skype and LINE—see the emoji press page for more examples.
> However, to be as effective and simple to use as emoji characters, a full solution requires significant infrastructure changes to allow simple, reliable input and transport of images (stickers) in texting, chat, mobile phones, email programs, virtual and mobile keyboards, and so on. (Even so, such images will never interchange in environments that only support plain text, such as email addresses.) Until that time, many implementations will need to use Unicode emoji instead
[1] http://unicode.org/reports/tr51/#Longer_Term
[+] [-] hf|9 years ago|reply
We're discussing the appropriate code-point for different smiley faces, obscure electrical symbols[0] or, in the present case, half stars to express film or book ratings, yet we have no complete set of sub- and superscripts!
Am I mistaken in thinking it odd, that there's a complete Klingon alphabet but no representation whatsoever for most Greek or Latin subscripts? Or what if, heaven forbid, I'd want to use a 'b' index/subscript? Tough! Not even the "phonetic extensions", where subscript-i comes from, provides it.
Refer to https://en.wikipedia.org/wiki/Unicode_subscripts_and_supersc... or look for SUBSCRIPT in http://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt
Surely there's the one or two actual scientists on the Unicode consortium? Or even the one odd soul still sporting a notion of consistency who finds it only logical to provide a "subscript b" if there's a "subscript a"?
How am I wrong?
[0] https://news.ycombinator.com/item?id=11958682
[+] [-] jcranmer|9 years ago|reply
Note that Klingon isn't in Unicode (it was explicitly rejected by the UTC, with a vote of 9 in favor of the rejection proposal, 0 against it, and 1 abstaining). Tengwar and Cirth, though, are actually considered serious proposals for Unicode, just really, really low priority compared to, say, Mayan script (for which the first proposal should be going live in 2017). Mayan script is interesting in its own right because it's the script (well, of the ones I'm aware of) that most challenges normal conventions on what constitutes letters and glyphs.
[+] [-] jessaustin|9 years ago|reply
[+] [-] 1wd|9 years ago|reply
Subscript letters were proposed as well: http://www.unicode.org/L2/L2011/11208-n4068.pdf but apparently "Not accepted: Because this has been controversial and is not directly related to repertoire under ballot, it is not appropriate to add it to Amd1 but may be considered for a future amendment" http://www.unicode.org/L2/L2012/12130-n4239.pdf
Looks like here's a recent draft for a new proposal: https://github.com/stevengj/subsuper-proposal
[+] [-] WalterBright|9 years ago|reply
[+] [-] gjasny|9 years ago|reply
See: https://github.com/powerline/fonts/blob/master/README.rst
A zsh theme with those characters in use: https://gist.github.com/agnoster/3712874
[+] [-] yes_or_gnome|9 years ago|reply
The ones that are "unique" are a bit annoying because they replace defined characters in the Basic Multilingual Plane - Private Use section(E000-FFFF). Even though the section is "Private Use" it is often already defined by your OS's system font. There's the Supplemental Private Use Areas A (F0000-FFFFD) and B (100000-10FFFF) which can be overwritten safely.
I scare quote "unique" because two of those characters are full-height arrows; one right-pointing, the other left-pointing. These are already defined as u1F780 (🞀) u1F782 (🞂). It may be the case that some fonts that the triangles either A) don't actually go from floor to ceiling, or B) they have empty space behind their hypotenuse.
The only truly unique character is the "git branch" pictograph. Maybe, someone could write up a convincing argument to include it, but I can't imagine one. It's not a symbol you see to often even in the git community. And, I would bet if you looked hard enough, there's some mathematical symbol that would be suitable.
Just FYI, I've used powerline fonts daily for the past ~3 years.
[+] [-] YeGoblynQueenne|9 years ago|reply
I can never find a lower-case Greek subscripted α or β when I need one...
[+] [-] JadeNB|9 years ago|reply
Agreed, but what we need even more than the symbols is some ((La)TeXy, says the mathematician) way of combining them. For example (says the mathematician who doesn't understand the complexity of text encodings), why do we need a whole bunch of separate "subscript m", "subscript n", etc., glyphs, rather than just one "subscript" combining mark?
[+] [-] WalterBright|9 years ago|reply
[+] [-] ygra|9 years ago|reply
Precomposed characters exist because they existed in other encodings previously and encoding such characters has been one of the core principles of Unicode to ensure an easy upgrade path. Heck, we inherited box drawing characters that way, which I think are more questionable than combining diacritics.
[+] [-] kuschku|9 years ago|reply
[+] [-] contingencies|9 years ago|reply
[0] https://en.wikisource.org/wiki/Translation:Manshu/Chapter_7#...
[+] [-] tantalor|9 years ago|reply
[+] [-] Symbiote|9 years ago|reply
For etc, start here: http://unicode-search.net/unicode-namesearch.pl?term=fractio...
You can use "fraction slash" to make any fraction, using super/subscript numbers: ⁷⁄₃₃
[+] [-] sanbor|9 years ago|reply
[+] [-] infogulch|9 years ago|reply
[+] [-] unknown|9 years ago|reply
[deleted]
[+] [-] unknown|9 years ago|reply
[deleted]
[+] [-] koltaggar|9 years ago|reply
[+] [-] kens|9 years ago|reply