top | item 42231092

(no title)

meew0 | 1 year ago

The “invisible symbols” are necessary to correctly represent human language. For instance, one of the most infamous Unicode control characters — the right-to-left override — is required to correctly encode mixed Latin and Hebrew text [1], which are both scripts that you mentioned. Besides, ASCII has control characters as well.

The “colorful icons” are not part of Unicode. Emoji are just characters like any other. There is a convention that applications should display them as little coloured images, but this convention has evolved on its own.

If you say that Unicode is too expansive, you would have to make a decision to exclude certain types of human communication from being encodable. In my opinion, including everything without discrimination is much preferable here.

[1]: https://en.wikipedia.org/wiki/Right-to-left_mark#Example_of_...

discuss

wruza|1 year ago

Copy this󠀠󠀼󠀼󠀼󠀠󠁉󠁳󠀠󠁴󠁨󠁩󠁳󠀠󠁮󠁥󠁣󠁥󠁳󠁳󠁡󠁲󠁹󠀠󠁴󠁯󠀠󠁣󠁯󠁲󠁲󠁥󠁣󠁴󠁬󠁹󠀠󠁲󠁥󠁰󠁲󠁥󠁳󠁥󠁮󠁴󠀠󠁨󠁵󠁭󠁡󠁮󠀠󠁬󠁡󠁮󠁧󠁵󠁡󠁧󠁥󠀿󠀠󠀾󠀾󠀾 sentence into this site and click Decode. (YMMW)

https://embracethered.com/blog/ascii-smuggler.html

hnuser123456|1 year ago

Wow. Did not expect you can just hide arbitrary data inside totally normal looking strings like that. If I select up to "Copy thi" and decode, there's no hidden string, but just holding shift+right arrow to select just "one more character", the "s" in "this", the hidden string comes along.

n2d4|1 year ago

> Is this necessary to correctly represent human language?

Yes! As soon as you have any invisible characters (eg. RTL or LTR marks, which are required to represent human language), you will be able to encode any data you want.

bawolff|1 year ago

> one of the most infamous Unicode control characters — the right-to-left override

You are linking to an RLM not an RLO. Those are different characters. RLO is generally not needed and more special purpose. RLM causes much less problems than RLO.

Really though, i feel like the newer "first strong isolate" character is much better designed and easier to understand then most of the other rtl characters.

n2d4|1 year ago

Granted, technically speaking emojis are not part of the "Unicode Standard", but they are standardized by the Unicode Consortium and constitute "Unicode Technical Standard #51": https://www.unicode.org/reports/tr51/

Y_Y|1 year ago

I'm happy to discriminate against those damn ancient Sumerians and anyone still using goddamn Linear B.

Analemma_|1 year ago

Sure, but removing those wouldn't make Unicode any simpler, they're just character sets. The GP is complaining about things like combining characters and diacritic modifiers, which make Unicode "ugly" but are necessary if you want to represent real languages used by billions of people.

gwervc|1 year ago

People who should use Sumerian characters don't even use them, sadly. First probably because of habit with their transcription, but also because missing variants of characters mean lot of text couldn't be accurately represented. Also I'm downvoting you for discriminating me.

account42|1 year ago

> The “colorful icons” are not part of Unicode. Emoji are just characters like any other. There is a convention that applications should display them as little coloured images, but this convention has evolved on its own.

Ok now you're just full of shit and you know it.