"Cattle labeling meat labeling supervision task transfer act" is just as bad as Rinderkennzeichnungsfleischetikettierungsüberwachungsaufgabenübertragungsgesetz, English just gets to use spaces where German doesn't. The underlying construction is the same. (I definitively got that translation wrong)
magarnicle|5 months ago
So surgery is full of -ectomies instead of -cut-outs.
1718627440|5 months ago
arkensaw|5 months ago
I wonder do German brains work on a much longer context window because of the language?
1718627440|5 months ago
Maybe, but more due to the spelling of numbers and long sentences. Compound words are not an example of this, since Germans can parse these words just fine as different things. It just means that the lowest "tokenization" in everyday use is not the word, but subcomponents of them.
Do English native speakers "tokenize" expressions in words? Do you see it as '(labelling) (of) (minced)' or '(label)l(ing) (of) (minc)(ed)' ?
I can't speak for most Germans, but the algorithm I think I use is just greedy from left to right. This is also consistent with how mistokenization in common puns works, so I think this is common.
In primary school we trained to recognize syllable boundaries. Is that just a German thing, or is this common in other countries? You need to know these for spelling and once you know these, separating word components becomes trivial.
detaro|5 months ago
b) the official title of the law was "Gesetz zur Übertragung der Aufgaben für die Überwachung der Rinderkennzeichnung und Rindfleischetikettierung", so how again is it that English "gets to use a sentence" and German doesn't? German has the choice depending on context, sometimes having one word is convenient.
bmacho|5 months ago