top | item 45561965

(no title)

zettabomb | 4 months ago

I have always wondered why Turkey chose to Latinize in this way. I understand that the issue is having two similar vowels in Turkish, but not why they decided to invent the dotless I, when other diacritics already existed. Ĭ Î Ï Í Ì Į Ĩ and almost certainly a dozen other would've worked, unless there was already some significance to the dot in Turkish that's not obvious.

discuss

jeroenhd|4 months ago

Computers and localisation weren't relevant back in the early 20th century. The dotless existed before the dotted i (in Greek script as iota). Some European scholars putting an extra dot on the letter to make it stand out a bit more are as much to blame as the Turks for making the distinction between the different i-vowels clear.

Really, this bug is nothing but programmers failing to take into account that not everybody writes in English.

JuniperMesos|4 months ago

It's not exactly programmers failing to take into account that no everybody writes in English - if that were the case, then it would simply be impossible to represent the Turkish lowercase-dotless and uppercase-dotted I at all. The actual problem is failing to take into account that operations on text strings that work in one language's writing might not work the same way in a different language's writing system. There's a lot of languages in the world that use the Latin writing system, and even if you are personally a fluent speaker and writer of several of them, you might simply have not learned about Turkish's specific behavior with I.

jagrsw|4 months ago

> that not everybody writes in English.

I don't know... I understand the history and reasons for this capitalization behavior in Turkish, and my native language isn't English, which had to use a lot of strange encodings before the introduction of UTF-8.

But messing around with the capitalization of ASCII <= codepoint(127) is a risky business, in my opinion. These codepoints are explicitly named:

"LATIN CAPITAL LETTER I" "LATIN SMALL LETTER I"

and requiring them to not match exactly during capitalization/diminuitization sounds very risky.

troad|4 months ago

> Really, this bug is nothing but programmers failing to take into account that not everybody writes in English.

This bug is the exact opposite of that. The program would have worked fine had it used pure ASCII transforms (±0x20); it was the use of library functions that did in fact take Turkish into account that caused the problem.

More broadly, this is not an easy issue to solve. If a Turkish programmer writes code, what is the expected behaviour for metaprogramming and compilers? Are the function names in English or Turkish? What about variables, object members, struct fields? You could have one variable name that references some government ID number using its native Turkish name, right next to another variable name that uses the English "ID". How does the compiler know what locale to use for which symbol?

Boiling all of this down to 'just be more considerate' is not actually constructive or actionable.

nurettin|4 months ago

There was actually three! i (as in th[i]s), î (as in ch[ee]se) and ı which sounds nothing like the first two, it sounds something like the e in bag[e]l. I guess it sounded so different that it warranted such a drastic symbolic change.

ithkuil|4 months ago

Turkish exhibits a vowel harmony system and uses diacritics on other vowels too and the choice to put "i" together with other front vowels like "ü" and "ö" and put "ı" together with back vowels like "u" and "o" is actually pretty elegant.

The latinization reform of the Turkish language predates computers and it was hard to foresee the woes that future generations would have had with that choice

mrighele|4 months ago

The issue is not the invention of the dotless I, it already exists, the issue is that the took a vowerl , i/I, and the assigned the lower case to one vowel, and the upper case to a different one, and invented what left missing.

It's like they decided that the uppercase of "a" is "E" and the uppercase of "e" is "A".

pinkmuffinere|4 months ago

This is misleading, because it assumes that i/I naturally represent one vowel, which is just not the case. i/I represents one vowel in _English_, when written with a latin script. ̶I̶n̶ ̶f̶a̶c̶t̶ ̶e̶v̶e̶n̶ ̶t̶h̶i̶s̶ ̶i̶s̶n̶'̶t̶ ̶c̶o̶r̶r̶e̶c̶t̶,̶ ̶i̶/̶I̶ ̶r̶e̶p̶r̶e̶s̶e̶n̶t̶s̶ ̶o̶n̶e̶ ̶p̶h̶o̶n̶e̶m̶e̶,̶ ̶n̶o̶t̶ ̶o̶n̶e̶ ̶v̶o̶w̶e̶l̶.̶ <see troad's comment for correction>

There is no reason to assume that the English representation is in general "correct", "standard", or even "first". The modern script for Turkish was adopted around the 1920's, so you could argue perhaps that most typewriters presented a standard that should have been followed. However, there was variation even between different typewriters, and I strongly suspect that typewriters weren't common in Turkey when the change was made.

ozgung|4 months ago

Nope, we decided to do it the correct and logical way for our alphabet. Some glyphs are either dotted or dotless. So, we have Iı, İi, Oo, Öö, Uu, Üü, Cc, Çç, Ss and Şş. You see the Ii pair is actually the odd one in the series.

Also, we don't have serifs in our I. It's just a straight line. So, it's not even related to your Ii pair in English. You can't dictate how we write our straight lines, can you.

The root cause of the problem is in the implementation and standardization of the computer systems. Computers are originally designed only for English alphabet in mind. And patched to support other languages over time, poorly. Computers should obey the language rules, not the other way around.

steezeburger|4 months ago

I don’t think that’s the right way to think about it. It’s not like they were Latinizing Turkish with ASCII in mind. They wanted a one-to-one mapping between letters and sounds. The dot versus no dot marks where in your mouth or throat the vowel is formed. They didn’t have this concept that capital I automatically pairs with lowercase i. The dot was always part of the letter itself. The reform wasn’t trying to fit existing Western conventions, it was trying to map the Turkish sounds to symbols.

okanat|4 months ago

Not really. Turkish has a feature that is called "vowel harmony". You match suffixes you add to a word based on a category system: low pitch vs high pitch vowels where a,ı,o,u are low pitch and e,i,ö,ü are high pitch.

Ö and ü were already borrowed from German alphabet. Umlaut-added variants of 'ö' and 'ü' have a similar effect on 'o' and 'u' respectively: they bring a back vowel to front. See: https://en.wikipedia.org/wiki/Vowel . Similarly removing the dots bring them back.

Turkish already had i sound and its back variant which is a schwa-like sound: https://en.wikipedia.org/wiki/Close_back_unrounded_vowel . It has the same relation in IPA as 'ö' has to 'o' and 'ü' has to 'u'. Since the makers of the Turkish variant of Latin Alphabet had the rare chance of making a regular pronunciation system with the state of the language and since removing the dots had the effect of making a front vowel a back vowel, they simply copied this feature from ö and ü to i:

Just remove the dots to make it a back vowel! Now we have ı.

When comes to capitalization, ö becomes Ö, ü becomes Ü. So it is just logical to make the capital of i İ and the lowercase of I ı.

ayhanfuat|4 months ago

Except for the a/e pair, front and back vowels have dotted and dotless versions in Turkish: ı and i, o and ö, u and ü.

o11c|4 months ago

In that case they should've used ï for consistency.

zettabomb|4 months ago

Makes sense enough, but why not use i and ï to be consistent?