Spaces between words is a relatively recent Irish invention (7th or 8th century) in western written language, so it’s not like it’s an obvious thing to have.
> Spaces between words is a relatively recent Irish invention (7th or 8th century) in western written language, so it’s not like it’s an obvious thing to have.
As an English native speaker who learned Mandarin, I really didn't find the lack of spaces harmful to learning the language.
Since each character represents a syllable, rather than a specific sound, and the written language is essentially not phonetic, reading the characters is an entirely different experience.
OTOH, you have English and German and others that frequently use compound words, and the use of spaces becomes really important to understanding the writing.
Lexing is very clear in Chinese. It's never the case that you look at a Chinese sentence and don't know where a character ends and another begins. Take this sentence in both languages: "good morning, how are you"
早安,你好吗
This sentence clearly has "spaces" and I'm pretty sure any person illiterate in Chinese could tell you there are 5 characters / words. Technically the third character is composed of 人 and 尔 but I don't know that anyone, even kids or beginners, would mistake those as _not_ going together.
สวัสดีตอนเช้าคุณเป็นอย่างไรบ้าง
In contrast, Thai is as you say: lexing and parsing bleed together. There are 7 words in this sentence, but you need to lex the 10 syllables and run them through your mental dictionary to recognize the possible words they could be. My Thai is very limited, but there are examples of sentences out there that actually have multiple valid readings with different semantic meanings, depending on how you group sounds together.
With the kind of mixed script used in Japan and that used to be used in Korea, they're not exactly necessary (still useful, but not necessary). Neither language uses prefixes much, so a sinograph is a pretty reliable indicator of the beginning of a word, followed by the inflection written out in a phonetic script like hiragana or hangeul. In Japanese's case, a switch from hiragana to katakana also indicates a word boundary and highlights that the word's likely a nonsinitic loan or the name of a plant or animal species or other technical term.
韓国人はキムチを食べます。and would be read as "kankokujinwa kimuchiwo tabemasu".
Splitting it with spaces:
韓国人は キムチを 食べます。
The heftier kanji denoting "Korean person" and at the start of "eat" should be clear even to the untrained eye, while people who've studied the language can easily tell that キムチ is "kimuchi" written in katakana. The sentence is pretty easy to parse without spaces, at the cost of using one of the most insane writing systems in the world.
Now, what if we wrote the entire thing in hiragana instead?
かんこくじんはきむちをたべます。
... yyeaahh. Spaces. Please.
かんこくじんは きむちを たべます。There, much better, though almost no one fluent in Japanese has practice reading stuff like that.
In Korean, without spaces we'd have:
한국사람들은김치를먹어요. Again, similar problems. Korea has adopted spaces now that they don't use sinographs, so we'd have:
한국 사람들은 김치를 먹어요. (han'guk sa'ram'deul'eun kim'chi'reul mog'o'yo)
If we wrote "Korean person" with the same Sinitic loans in the Japanese sentence, we might get:
How many (modern, written) "ideographic languages" exist? I can think of two: Chinese and Japanese. Old Korean and Vietnamese used some Chinese characters, but the modern languages use none.
It is interesting to me when written Chinese and Japanese use commas. It is pretty much never required, but pure style. It does help to breakup a complex sentence, similar to phonetic languages.
The thing that bugs me about written Thai is that there are spaces now and then and you would expect them to be at sentence breaks but they seem to be randomly placed throughout the text, almost as if that's where the writer felt like he needed to take a breath instead of where one sentence ends and another begins.
idk the more chinese i learn, the more im convinced that the very concept of individual words is blurred and not quite the same because of the way the writing system works
中国共产党, is that one word? should you break it up as 中国 共产党? what about 中国 共产 党? i dont think its nearly as clear which of these is correct as it is in english
seanmcdirmid|2 years ago
thaumasiotes|2 years ago
Perhaps, but interpuncts between words are several centuries older than that and occur as natural developments in e.g. the Roman Empire. https://loeb-art-center.vassarspaces.net/wp-content/gallery/...
The concept of word separation is an obvious thing to have. Whether the separator is empty space is unimportant.
postcynical|2 years ago
mahkeiro|2 years ago
xvilka|2 years ago
zdragnar|2 years ago
Since each character represents a syllable, rather than a specific sound, and the written language is essentially not phonetic, reading the characters is an entirely different experience.
OTOH, you have English and German and others that frequently use compound words, and the use of spaces becomes really important to understanding the writing.
I have zero experience with Thai.
eric-hu|2 years ago
Lexing is very clear in Chinese. It's never the case that you look at a Chinese sentence and don't know where a character ends and another begins. Take this sentence in both languages: "good morning, how are you"
早安,你好吗
This sentence clearly has "spaces" and I'm pretty sure any person illiterate in Chinese could tell you there are 5 characters / words. Technically the third character is composed of 人 and 尔 but I don't know that anyone, even kids or beginners, would mistake those as _not_ going together.
สวัสดีตอนเช้าคุณเป็นอย่างไรบ้าง
In contrast, Thai is as you say: lexing and parsing bleed together. There are 7 words in this sentence, but you need to lex the 10 syllables and run them through your mental dictionary to recognize the possible words they could be. My Thai is very limited, but there are examples of sentences out there that actually have multiple valid readings with different semantic meanings, depending on how you group sounds together.
dumbotron|2 years ago
qingcharles|2 years ago
I'm used to it in Asian languages but it still does my head in when I try to read older Latin documents.
soundnote|2 years ago
Say, for example:
"Korean people eat kimchi"
In Japanese/Korean, the structure would be:
Korean-person-topic marker-kimchi-object marker-eat-present tense.
In Japanese mixed script, that looks like:
韓国人はキムチを食べます。and would be read as "kankokujinwa kimuchiwo tabemasu".
Splitting it with spaces:
韓国人は キムチを 食べます。
The heftier kanji denoting "Korean person" and at the start of "eat" should be clear even to the untrained eye, while people who've studied the language can easily tell that キムチ is "kimuchi" written in katakana. The sentence is pretty easy to parse without spaces, at the cost of using one of the most insane writing systems in the world.
Now, what if we wrote the entire thing in hiragana instead?
かんこくじんはきむちをたべます。
... yyeaahh. Spaces. Please.
かんこくじんは きむちを たべます。There, much better, though almost no one fluent in Japanese has practice reading stuff like that.
In Korean, without spaces we'd have:
한국사람들은김치를먹어요. Again, similar problems. Korea has adopted spaces now that they don't use sinographs, so we'd have:
한국 사람들은 김치를 먹어요. (han'guk sa'ram'deul'eun kim'chi'reul mog'o'yo)
If we wrote "Korean person" with the same Sinitic loans in the Japanese sentence, we might get:
한국인들은 김치를 먹어요. (han'gug'in'deul'eun kim'chi'reul mog'o'yo)
Spaces clearly do help.
throwaway2037|2 years ago
It is interesting to me when written Chinese and Japanese use commas. It is pretty much never required, but pure style. It does help to breakup a complex sentence, similar to phonetic languages.
geomark|2 years ago
The thing that bugs me about written Thai is that there are spaces now and then and you would expect them to be at sentence breaks but they seem to be randomly placed throughout the text, almost as if that's where the writer felt like he needed to take a breath instead of where one sentence ends and another begins.
deadfoxygrandpa|2 years ago
中国共产党, is that one word? should you break it up as 中国 共产党? what about 中国 共产 党? i dont think its nearly as clear which of these is correct as it is in english