top | item 39808983

(no title)

tbenst | 1 year ago

x1798DE captured my intent well. For example, tonal languages like Mandarin or Cantonese may be more difficult to decode if vocal cords aren’t vibrating, and languages with more phonemes that have both a voiced and unvoiced version might be more difficult. I still think decoding will be possible for general language, but that’s a hypothesis whereas I know it’s true for English.

discuss

thaumasiotes|1 year ago

> and languages with more phonemes that have both a voiced and unvoiced version might be more difficult.

I had the understanding that English is unusually rich in phonemes that occur in both a voiced and unvoiced version. But as I've mentioned sidethread, this just isn't very significant as far as transcribing English goes.

English has an almost full series of stop and fricative phonemes that exhibit voicing contrasts:

- Bilabial, alveolar, and velar stops /p, b, t, d, k, g/, though the distinction between /t/ and /d/ disappears intervocalically in American English. [In practice, English speakers differentiate these phonemes more by the contrast of aspiration than by the contrast of voicing.]

- Interdental, labiodental, alveolar, palatal, but generally not velar, fricatives /θ, ð, f, v, s, z, ʃ, ʒ/, along with palatal affricates /tʃ, dʒ/.

- Nasals and approximants are always voiced.

Compare a language like Mandarin Chinese, where there are between zero and one pairs of phonemes that contrast by voicing (the sound represented by pinyin "r" may be a voiced fricative otherwise equivalent to "sh", or it may be an approximant; there is no contrasting voiceless approximant), or Spanish, where only the stops feature this contrast.

What are the languages that have more voicing contrasts than English does? It would almost be necessary for such a language to distinguish between voiced and unvoiced vowels. (Some quick research suggests that Icelandic at least has a comparable number of voicing contrasts, but it is not obviously more than English and appears to be actively shrinking.)

> tonal languages like Mandarin or Cantonese may be more difficult to decode if vocal cords aren’t vibrating

More difficult, yes, but in the sense that decoding may take more computation, not that the error rate will go up.

Again, we can already observe that e.g. Mandarin speakers do not have trouble understanding text that carries no information about tone, nor do they have trouble understanding songs, where lexical tone is overridden by the melody of the song.

(What happens here depends what you mean. If you want to decode speech into pinyin with tone marks omitted, the lack of ability to measure tones will fail to be a problem by definition. If you want to decode into Chinese characters, you'll need a robust model of the language, at which point lack of tones will also fail to be a problem - the language model will cover for it. If you want to decode into pinyin with tone marks, you won't be able to do that without using a language model.)