top | item 26597907

(no title)

emergie | 5 years ago

Contemplate 2 methods of writing a 'dz' digraph

  dz - \u0064\u007a, 2 basic latin block codepoints
  DZ - \u0044\u005a
  Dz - \u0044\u007a
  
  dz - \u01f3, lowercase, single codepoint
  DZ - \u01f1, uppercase
  Dz - \u01f2, TITLECASE!
What happens if you try to express dż or dź from polish orthography?

You can use

  dż - \u0064\u017c - d followed by 'LATIN SMALL LETTER Z WITH DOT ABOVE'
  dż - \u0064\u007a\u0307 - d followed by z, followed by combining diacritical dot above
  dż - \u01f3\u0307 - dz with combining diacritical dot above

  multiplied by uppercase and titlecase forms
In polish orthography dz digraph is considered 2 letters, despite being only one sound (głoska). I'm not so sure about macedonian orthography, they might count it as one thing.

Medieval ß is a letter/ligature that was created from ſʒ - that is a long s and a tailed z. In other words it is a form of 'sz' digraph. Contemporarily it is used only in german orthography.

How long is ß?

By some rules uppercasing ß yields SS or SZ. Should uppercasing or titlecasing operations change length of a string?

discuss

order

saurik|5 years ago

I mean, if you remove the concept of a computer, and ask the question "how many letters are in this word?", you are likely going to end up in some highly-contextual conversations--taking into account all of language, culture, geography, and time--with respect to some of these examples... the concept of "uppercasing or titlecasing" has absolutely no reason to somehow have some logical basis like "the number of characters (which I will note is itself poorly-defined even on a computer) remains constant".