Here's a basic explanation of the diacritic notion as it applies to Asian scripts.
Thai belongs to a family of scripts known as abugidas. Abugidas include pretty much all South Asian and many Southeast Asian scripts, for example Burmese, Cambodian, Dai, Lao, Thai, etc. They all pretty much derive from Brahmi, which was the proto Indian script. You can see an example of Brahmi over here: http://en.wikipedia.org/wiki/Brahmi
Abugidas are based upon combining multiple glyphs in to syllables, often allowing glyphs above, below, to the left and to the right of the initial consonant, and often including a closing consonant. Most glyphs tend to be consonants, though some are vowels, and others can be special marks for indicating tone or other notions. Often shorter vowels are excluded (as in Modern Standard Arabic).
In old times, such scripts were handled with wacky font-hacks. However, with Unicode, there are some super complex algorithms that make glyphs combine both visually (when typesetting) and logically (when saving/searching/etc). You can actually type a character and a diacritic and it can sometimes automatically combine to form a single character, if such a beast exists, not just visually but when saving to disk.
What makes it even more confusing is that South Asian scripts in particular have mega-combo characters, where whole chunks of glyphs sort of fold in to flowing short-hand symbols. In the case of Sanskrit, I believe loads of these were used in history but few are used these days.
I think that's a fair pontification - corrections welcome!
I wouldn't call them "mega-combo characters" ;-) Typically, these are at most two or three letters written together, and they are very much used today in Sanskrit, Hindi and other regional variants such as Bengali etc.
Of course you also have multiple words that have been combined into one large compound word, by way of appropriate linguistic rules of combining sounds. This is similar to long compound words in German.
i have learned the thai alphabet and that "mega-combo character" comment just made my day. Reading thai pretty much feels like reading regexps. Anditdoesntmakeitanyeasierthattheydontusespaces
I actually get a different rendering for some reason. I get all the extra diatricts to the right above empty circles. That certainly explains why I didn't quite understand the problem.
That's not really a Thai character, right? It's way too many bytes! It must be an intentional repetition of stacking diacritics. Some of the ones in that Google result page are 21 bytes.
This character used to be (or maybe still is) a very popular way of trolling people on facebook. Flooding chat window with those funny letters seemed to crash the browser after a while.
Once had this dude sign up for our page management... we had a lot of assumptions about plain or at least sane text that had to get updated. https://www.facebook.com/glitchr
What on earth is going on here? 🔴҈҈҈҈̚҉̚҉̚҉̚҉̚҉̚҉̚҉̚҉̚҉̚҉̚҉̚҉̚҉̚҉̚҉̚҉̚҉̚҉̚҉̚҉̚҉̚҉̚҉̚҉̚҉̚҉̚҉̚҉̚҉̚҉̚҉̚҉̚҉̚҉̚҉̚҉̚҉̚҉̚҉̚҉̚҉̚҉̚҉̚҉̚҉̚҉̚҉̚҉̚҉̚҉̚҉̚҉̚҉̚҉̚҉̚҉̚҉̚҉̚҉̚҉̚҉̚҉̚҉̚҉̚҉̚҉̚҉̚
That's a pretty fun one. It's a sequence of the following codepoints: {LARGE RED CIRCLE} {COMBINING CYRILLIC HUNDRED THOUSANDS SIGN} {COMBINING CYRILLIC HUNDRED THOUSANDS SIGN} {COMBINING CYRILLIC HUNDRED THOUSANDS SIGN} {COMBINING CYRILLIC HUNDRED THOUSANDS SIGN} and then 66 repetitions of {COMBINING LEFT ANGLE ABOVE} {COMBINING CYRILLIC MILLIONS SIGN}
Hmm, that's interesting. At work I looked at this on my Ubuntu machine with the Chromium browser. It didn't look particularly special because all the diacritic marks were drawn on top of each other in the same location above the letter.
I come home and look at it on my Windows machine with Chrome, and now I see the big stack of diacritic marks that I assume everyone's making a fuss about. I assume it's something to do with the way that the system's installed font lays out the marks in question.
Yeah I was completely baffled by this until I looked in Google Image Search. I ran this on my Windows box and sure enough it looks crazy with little 'springs' going everywhere. Under Chrome, Firefox and Safari on OS X it looks normal though, just foreign text.
I was going to ask about this a week ago. An "Anonymous" twitter account posted it last week and the letters overlaid 3 or 4 tweets above it. Had no idea what it was.
[+] [-] contingencies|13 years ago|reply
Thai belongs to a family of scripts known as abugidas. Abugidas include pretty much all South Asian and many Southeast Asian scripts, for example Burmese, Cambodian, Dai, Lao, Thai, etc. They all pretty much derive from Brahmi, which was the proto Indian script. You can see an example of Brahmi over here: http://en.wikipedia.org/wiki/Brahmi
Abugidas are based upon combining multiple glyphs in to syllables, often allowing glyphs above, below, to the left and to the right of the initial consonant, and often including a closing consonant. Most glyphs tend to be consonants, though some are vowels, and others can be special marks for indicating tone or other notions. Often shorter vowels are excluded (as in Modern Standard Arabic).
In old times, such scripts were handled with wacky font-hacks. However, with Unicode, there are some super complex algorithms that make glyphs combine both visually (when typesetting) and logically (when saving/searching/etc). You can actually type a character and a diacritic and it can sometimes automatically combine to form a single character, if such a beast exists, not just visually but when saving to disk.
What makes it even more confusing is that South Asian scripts in particular have mega-combo characters, where whole chunks of glyphs sort of fold in to flowing short-hand symbols. In the case of Sanskrit, I believe loads of these were used in history but few are used these days.
I think that's a fair pontification - corrections welcome!
[+] [-] ubasu|13 years ago|reply
Of course you also have multiple words that have been combined into one large compound word, by way of appropriate linguistic rules of combining sounds. This is similar to long compound words in German.
[+] [-] sebilasse|13 years ago|reply
[+] [-] darkstalker|13 years ago|reply
Ę̮̱͔͓ͯ͗ͫ̌̏ͫ͌́x̘̤͚̰̫̫̗̤̱̒̓ͨͯ͑̓ͥͫ̕å̰͚̓͒ͫm̛̤͕̫̳̺̩̄̓ͨͥ͜ͅp̰͉͗ͤl̵̖̗̫͍͓͋̍̐͌̐̒e̡̧͔̮̿͒͋̈́͡ ̸͉͔͗͐̍ͩͫ̀ͭz̨͎̱̟̘̓ä́͊̉̾͜͏̺̲̘l̛̥͇͖̹̻̜̈̀̀g̴̗̻͚͙̭͍̩̔̉̆ͦ͌͘oͬ̾͑̉̋҉̢͙̹̹̺̺ ̷̢͖̲͇̺̪̹̙̺̘͐̄ͬ̍͆t̶͔̣̜̟͌̀ͪ̅ͧ̒̒ͫ̚ȅ̠̪̻̄ͫ̋͝xͭ͆͝͏̮͔̜t̟̬̦̣̟͉͈̞̝ͣͫ͞,̡̼̭̘̙̜ͧ̆̀̔ͮ́ͯͯ ̢̮͎̦͙͇ͪͪ̈͌ͬ̄̓̐͞ḷ̹̺̙̜̇̉́͡o̢̻̪̠̬̍͐̉ͮͥ̑͊ͪt̢̘̬͓͕̬́ͪ̽́s̢̜̠̬̘͖̠͕ͫ͗̾͋͒̃͛̚͞ͅ ̝̣̥̳͇͎̭̾̔̀̀̔̽̕o͇ͮ̋̅͋͆̈́̔͗͟f̙̙͕̮̈ͪͯ̿̈͠ ̯͎̺͎̺̃̀͟͟d͍͍̺͂̂i̪̩̙̭̝͖ͥ͂̂̈̒̎r̥̜̃̏̃͋̓ͥ̃̉̄͘͢t̳̦̬͆͂ͬͧ̏ͬ̓y̵̮̗̟ͩ̃̾͐́ͩ ̣͍̘͈̫͓̊ͤ̚͡͝cͥͭ͐̎͆͘̕҉̫̞h̴̢̫̘͉̖ͪͩ̓ͪͯ̑͑̓̎͝a̧̢̖͔̗̬̘̯̟ͪ̐͌̍͂̊r̷̝͓̬͆̄̽̓̋ͬ̈̔͝͠ā̗͑ͬ̀c͒̎͌̔͛͘҉̘͖͖̖̯̖͖͙ṱ̶͇͚͎ͯ͋͢͝eͦ̽͆͏̟̭̠r̙̖͙̳̾ͯ̈̕ṣ͙̈͆̔͗̉ͥ̋̔̕
[+] [-] tambourine_man|13 years ago|reply
http://imgur.com/dXwEHeN.jpg
[+] [-] windsurfer|13 years ago|reply
[+] [-] juan_juarez|13 years ago|reply
[+] [-] one-man-bucket|13 years ago|reply
[+] [-] stjarnljuset|13 years ago|reply
[+] [-] drzaiusapelord|13 years ago|reply
http://imgur.com/3u1M2IG
[+] [-] C1D|13 years ago|reply
[+] [-] cataflam|13 years ago|reply
Both Opera and Chrome on Windows XP. On windows 7 I get the interestingly looking results with both Opera and Chrome.
[+] [-] sp332|13 years ago|reply
[+] [-] ambiate|13 years ago|reply
This is an interesting result. Even more interesting is the extra 2000 results that Google throws in my direction.
[+] [-] simcop2387|13 years ago|reply
[+] [-] unknown|13 years ago|reply
[deleted]
[+] [-] orbitur|13 years ago|reply
[+] [-] morphics|13 years ago|reply
[+] [-] sp332|13 years ago|reply
[+] [-] pawelwentpawel|13 years ago|reply
[+] [-] clone1018|13 years ago|reply
That would be zalgo: http://eeemo.net/
[+] [-] patmcguire|13 years ago|reply
[+] [-] bazzargh|13 years ago|reply
glitchr's tweets may cause other twitter clients to crash too, eg this one (you have been warned!) https://twitter.com/joshlogan42/status/303975029698342912
[+] [-] micampe|13 years ago|reply
[+] [-] lhnz|13 years ago|reply
[+] [-] Groxx|13 years ago|reply
[+] [-] masklinn|13 years ago|reply
[+] [-] DanBC|13 years ago|reply
HN is now not wrapping long lines, because your unbroken long line has widened the margins.
[+] [-] pmelendez|13 years ago|reply
[+] [-] claudius|13 years ago|reply
[0] http://imgur.com/joMTxLm
[1] http://imgur.com/TEmOTPo
[+] [-] rplacd|13 years ago|reply
[+] [-] Socketubs|13 years ago|reply
[+] [-] 3dptz|13 years ago|reply
[+] [-] fractalsea|13 years ago|reply
I come home and look at it on my Windows machine with Chrome, and now I see the big stack of diacritic marks that I assume everyone's making a fuss about. I assume it's something to do with the way that the system's installed font lays out the marks in question.
[+] [-] afiler|13 years ago|reply
𝑀𝑎𝑛𝑦 𝑝𝑒𝑜𝑝𝑙𝑒 𝑤𝑜𝑛'𝑡 𝑏𝑒 𝑎𝑏𝑙𝑒 𝑡𝑜 𝑠𝑒𝑒 𝑡ℎ𝑖𝑠, 𝑜𝑟 𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 𝑎𝑙𝑙 𝑜𝑓 𝑡ℎ𝑒 𝑐ℎ𝑎𝑟𝑎𝑐𝑡𝑒𝑟𝑠, 𝑠𝑖𝑛𝑐𝑒 𝐼 𝑡ℎ𝑖𝑛𝑘 𝑎 𝑈𝑛𝑖𝑐𝑜𝑑𝑒 6.0 𝑓𝑜𝑛𝑡 𝑖𝑠 𝑟𝑒𝑞𝑢𝑖𝑟𝑒𝑑.
𝔼𝕧𝕖𝕟 𝕗𝕖𝕨𝕖𝕣 𝕗𝕠𝕟𝕥𝕤 𝕙𝕒𝕧𝕖 𝕥𝕙𝕖 𝕗𝕦𝕝𝕝 𝕕𝕠𝕦𝕓𝕝𝕖-𝕤𝕥𝕣𝕦𝕔𝕜 𝕒𝕝𝕡𝕙𝕒𝕓𝕖𝕥, 𝕥𝕙𝕠𝕦𝕘𝕙 𝕚𝕥 𝕨𝕠𝕣𝕜 𝕗𝕚𝕟𝕖 𝕗𝕠𝕣 𝕞𝕖 𝕠𝕟 𝕆𝕊 𝕏.
http://mar.cx/unicate/
[+] [-] darkstalker|13 years ago|reply
[+] [-] eksith|13 years ago|reply
Also, it seems to not work on all browsers and even then, FF and IE do slightly different things : http://i.imgur.com/hfWu5Bs.png
I'm on Win7.
Edit: I just noticed, on FF, the character spills out of the tab preview text and onto the chrome background as well.
[+] [-] josephjrobison|13 years ago|reply
[+] [-] DanBC|13 years ago|reply
I (OS X; crome) get little blobs over the n. That's wrong? But doesn't break the page?
[+] [-] mech4bg|13 years ago|reply
[+] [-] darkhorn|13 years ago|reply
[+] [-] deadfall|13 years ago|reply
[+] [-] mkhalil|13 years ago|reply
[+] [-] aidos|13 years ago|reply
[+] [-] nwh|13 years ago|reply
[+] [-] AndyKelley|13 years ago|reply