Seems like OP should have considered a popularity weighting. Among the 25 weirdest we see Mandarin and Spanish, the world's two most popular languages by native speaker count. That's a hint that what you're measuring isn't exactly weirdness.
Meanwhile I see Hungarian, famous for being the hardest language to learn, is fifth least weird. Even stranger, Cantonese, which is almost exactly the same in writing as 'weird' Mandarin, is sixth least weird. How can two languages that you write the same be so very far apart in feature set?
I think 'weirdness' in this context should not be conflated with 'norm'. Instead, as I read it, I think about it the same way you would think about the regularity of a language (to put it into compsci terms)
I speak both mandarin and cantonese. I think a lot of the 'weirdness' factor comes from actually saying the words. Also there are some words in cantonese that are not in mandarin. For example, in Cantonese, there is 'mou' (the negation of having), while in Mandarin, it'd be pronounced as two words ('mei2 you3). Spoken cantonese in some regards, is easier to pick up than spoken mandarin (definitely easier to cuss in)
As I understand it dialects which somehow qualify as language are weighted more than English and all it's dialects? Definitions of languages tend to be quite arbitrary or even politically motivated at times; look no further than the Balkans or Moldovan [1].
> Meanwhile I see Hungarian, famous for being the hardest language to learn
Source? It's hard to determine which language is hardest to learn since it all depends on what languages you already speak.
People who are familiar with Finnish and Estonian probably won't find Hungarian too hard, because the three languages are related.
Another example: Is English hard to learn? Probably not if your native language is a Germanic one (such as Dutch), but if your native language is Japanese, English might be a serious challenge.
Seems like they are just counting the weird features of languages. It is correct that language like Turkish (probably Hungarian as well) is least weird because it does not have any of the weird features like word gender, irregular verbs, prefixes etc. But two main features of language, extreme inflection and vowel harmony makes learning it difficult.
It would be cool if the author made the dataset available it would be fun to try other things with it - population weighing (as WildUtah mentions), grouping them by language families and calculating intra and inter-group weirdness (and distances), clustering the languages into new groups, calculating weirdness as mean distance in the 21 dimensional space to all other languages, projecting the space itself on a plane so that we can see it better,...
I don't know where to start on this page's absurd notion that Cantonese is a non-"weird" language, but that Mandarin is weird. This must be some academic nonsense based on sounds and disregarding other critical language features.
Let's put aside the fact that Cantonese has among the most complex pronunciation systems in the world, with seven tones and both long and short versions of a number of sounds. This is a language which has four different communication modes. Here's how weird Cantonese is.
1. Cantonese speakers speak in Cantonese.
2. Cantonese speakers read and write in a totally different language, namely Modern Chinese, which for all intents and purposes is Mandarin.
3. Cantonese speakers read out loud in Modern Chinese (Mandarin), but pronounce each of the characters with radically different Cantonese sounds.
4. For purposes of comic book dialogue, etc., it is possible to read and write some Cantonese using various co-opted Chinese characters. But you can't pronounce all of Cantonese this way: many words have no written form whatsoever. This has resulted in a bizarre pidgin written form. For example, one very common word ("di1" -- "a few") is actually usually written as "D" rather than as a character. Other characters are impossible to write in current fonts, or are also used in Modern Chinese but for different words than in Cantonese, and so you see Latin letters like "o" and "a" next to them to suggest a different meaning.
I think we should have expected that Mandarin is weird but Cantonese is very normal - in the same way that Japan has a weird primary writing system, but its secondary writing system (Hiragana) is one of the most regular in the world. In fact the same reasoning could apply to Hindi - it's (or was until recently) a secondary language in India, with English as the language of government. Do other countries with two languages follow the same pattern? E.g. I would predict from this that Afrikaans would be a very non-weird language.
It'd be cool if they could include artificial languages like Esperanto and lojban. Given that one goal of both languages is to appeal to speakers of any language, it would be interesting to see if they achieved their goal (i.e. produced a very "non-weird" language).
There are a few potential problems which limit the significance of this:
1. There is not a universal definition of what defines a language and what is simply a dialect or a regional variation; this applies especially on large continents where there can be greater variation in language features between geographically remote locations, but no clean boundary at which you can say people speak one or the other language.
2. Languages evolve, diverge and sometimes borrow, and so a group of related languages can share the same potentially idiosyncratic feature because of common evolutionary roots rather than because the feature makes sense. This could explain the result for Hindi - it is a standard language that 'averages' a large number of other Indian languages.
I would call these more "caveats" than "problems" -- anything with a title like "weirdest languages" is going to be incredibly subjective, and there will be no doubt no shortage of people who disagree with specific choices, but as long as the reasoning is clear it can still be interesting/useful.
> This could explain the result for Hindi - it is a standard language that 'averages' a large number of other Indian languages.
I think this is a little misleading; a lot of Indian languages come from a completely different language family to Hindi (Hindi is Indo-European, but a lot of Indian languages are Dravidian, e.g. Tamil and Urdu), though I'm not qualified to speak about this.
The main problem with this analysis - I think - is that each language rank as 1, regardless of actual speaker size and history. There is a reason why some languages are much larger than others (mostly historical and political), and these larger languages (like English) has thus evolved towards a more simplified version of its former self.
English is thus - for all intends and purposes - not weird.
A more interesting approach would be to take larger languages, like English, Mandarin, Spanish, etc. and value their features higher than languages spoken by tribes or very few people, and thus you could determine a more accurate 'weirdness' index out of that.
This is great. I'm really happy to learn about WALS. I've been interested in constructing a practical standard language for humans for a long time. This kind of survey of what works seems essential.
I see this was posted overnight in my time zone. Several of the earlier comments correctly point out that empirically, a language that has been acquired by many second-language speakers (for example, English) must not strike too many people as unlearnably "weird." Many widely spoken languages have undergone a process that linguists call "koineization" (after the spread of Koine Greek as a common language of the ancient eastern Mediterranean and Near East)
in which the language simplifies some grammatical (and possibly phonological) features as it is spoken by more second-language speakers for trade or for use as a language of national administration in a multilingual region.
The United States is largely an English-speaking country, but only about one-fourth of Americans have ancestors who spoke English before arrival in North America. (Indeed, only one of my four grandparents, all of whom were born in the United States, grew up in an English-speaking household.) In other words, General American English is a koine language of second-language learners of English, so it is not surprising that it is spreading all over the world.
P.S. Feel free to visit my user profile here on HN to see more about my background in linguistics and language learning and teaching.
AFTER EDIT: Cantonese versus Mandarin as "dialects" or "languages" were mentioned in other comments. Cantonese is at least as different from Modern Standard Chinese (Mandarin) as German is from English. How you might write the conversation
"Does he know how to speak Mandarin?
"No, he doesn't."
他會說普通話嗎?
他不會。
in Modern Standard Chinese characters contrasts with how you would write
"Does he know how to speak Cantonese?
"No, he doesn't."
佢識唔識講廣東話?
佢唔識。
in the Chinese characters used to write Cantonese. As will readily appear even to readers who don't know Chinese characters, many more words than "Mandarin" and "Cantonese" differ between those sentences in Chinese characters.
I read something about how English was simplified as a result of the Viking invasion of England in the middle ages. It sounds like koineization. English might be weird, but it seems to be weird in a way that makes it highly exportable, like a successful product.
It's interesting to me how Norwegian is one of the top 25 strangest languages in the world on that list, but Danish and Swedish isn't. Maybe it was on the lower end of the top 25.
One possible theory is that it could be related how the data set deals with (or fails to deal with) Book-Norwegian vs New-Norwegian. Also modern Swedish and Danish grammar only uses two genders while Norwegian still has three, so that could weight in.
Actually, Swedish and Danish ARE very weird, but they didn’t make the cut-off of “14 or more of the 21 features attested”. Swedish has 12 of the 21 features listed in WALS, Danish has 13. Both of them are actually weirder than Norwegian (15/21 features):
[+] [-] WildUtah|12 years ago|reply
Meanwhile I see Hungarian, famous for being the hardest language to learn, is fifth least weird. Even stranger, Cantonese, which is almost exactly the same in writing as 'weird' Mandarin, is sixth least weird. How can two languages that you write the same be so very far apart in feature set?
[+] [-] chewxy|12 years ago|reply
I speak both mandarin and cantonese. I think a lot of the 'weirdness' factor comes from actually saying the words. Also there are some words in cantonese that are not in mandarin. For example, in Cantonese, there is 'mou' (the negation of having), while in Mandarin, it'd be pronounced as two words ('mei2 you3). Spoken cantonese in some regards, is easier to pick up than spoken mandarin (definitely easier to cuss in)
[+] [-] mxfh|12 years ago|reply
"a shprakh iz a dialekt mit an armey un flot" [2]
[1] http://en.wikipedia.org/wiki/Moldovan_language#Controversy [2] http://en.wikipedia.org/wiki/A_language_is_a_dialect_with_an...
[+] [-] workhere-io|12 years ago|reply
Source? It's hard to determine which language is hardest to learn since it all depends on what languages you already speak.
People who are familiar with Finnish and Estonian probably won't find Hungarian too hard, because the three languages are related.
Another example: Is English hard to learn? Probably not if your native language is a Germanic one (such as Dutch), but if your native language is Japanese, English might be a serious challenge.
[+] [-] dnda|12 years ago|reply
[+] [-] unknown|12 years ago|reply
[deleted]
[+] [-] yread|12 years ago|reply
[+] [-] chewxy|12 years ago|reply
[+] [-] SeanLuke|12 years ago|reply
Let's put aside the fact that Cantonese has among the most complex pronunciation systems in the world, with seven tones and both long and short versions of a number of sounds. This is a language which has four different communication modes. Here's how weird Cantonese is.
1. Cantonese speakers speak in Cantonese.
2. Cantonese speakers read and write in a totally different language, namely Modern Chinese, which for all intents and purposes is Mandarin.
3. Cantonese speakers read out loud in Modern Chinese (Mandarin), but pronounce each of the characters with radically different Cantonese sounds.
4. For purposes of comic book dialogue, etc., it is possible to read and write some Cantonese using various co-opted Chinese characters. But you can't pronounce all of Cantonese this way: many words have no written form whatsoever. This has resulted in a bizarre pidgin written form. For example, one very common word ("di1" -- "a few") is actually usually written as "D" rather than as a character. Other characters are impossible to write in current fonts, or are also used in Modern Chinese but for different words than in Cantonese, and so you see Latin letters like "o" and "a" next to them to suggest a different meaning.
Mandarin weird my foot.
[+] [-] lmm|12 years ago|reply
[+] [-] mtts|12 years ago|reply
Don't know about Hindi, but since it's used as a lingua franca of sorts that might also be the reason behind its normalness.
[+] [-] nemo1618|12 years ago|reply
[+] [-] A1kmm|12 years ago|reply
1. There is not a universal definition of what defines a language and what is simply a dialect or a regional variation; this applies especially on large continents where there can be greater variation in language features between geographically remote locations, but no clean boundary at which you can say people speak one or the other language.
2. Languages evolve, diverge and sometimes borrow, and so a group of related languages can share the same potentially idiosyncratic feature because of common evolutionary roots rather than because the feature makes sense. This could explain the result for Hindi - it is a standard language that 'averages' a large number of other Indian languages.
[+] [-] Osmium|12 years ago|reply
> This could explain the result for Hindi - it is a standard language that 'averages' a large number of other Indian languages.
I think this is a little misleading; a lot of Indian languages come from a completely different language family to Hindi (Hindi is Indo-European, but a lot of Indian languages are Dravidian, e.g. Tamil and Urdu), though I'm not qualified to speak about this.
[+] [-] Svip|12 years ago|reply
English is thus - for all intends and purposes - not weird.
A more interesting approach would be to take larger languages, like English, Mandarin, Spanish, etc. and value their features higher than languages spoken by tribes or very few people, and thus you could determine a more accurate 'weirdness' index out of that.
[+] [-] b6|12 years ago|reply
[+] [-] mtts|12 years ago|reply
[+] [-] pointernil|12 years ago|reply
If programming would have been developed in asia, what would be the paradigm for programming be?
Lines of code? Pictures? Left to right? Top to bottom? Objects and methods?
Any steampunk fantasies available reg. programming languages? How weird would those be?
[+] [-] Ashuu|12 years ago|reply
[+] [-] JoeAltmaier|12 years ago|reply
[+] [-] tokenadult|12 years ago|reply
http://www.xibalba.demon.co.uk/jbr/ranto/
I see this was posted overnight in my time zone. Several of the earlier comments correctly point out that empirically, a language that has been acquired by many second-language speakers (for example, English) must not strike too many people as unlearnably "weird." Many widely spoken languages have undergone a process that linguists call "koineization" (after the spread of Koine Greek as a common language of the ancient eastern Mediterranean and Near East)
http://www.jstor.org/discover/10.2307/4167665?uid=3739736&ui...
http://www.lancs.ac.uk/fss/linguistics/staff/kerswill/pkpubs...
http://en.wikipedia.org/wiki/Koin%C3%A9_language
in which the language simplifies some grammatical (and possibly phonological) features as it is spoken by more second-language speakers for trade or for use as a language of national administration in a multilingual region.
The United States is largely an English-speaking country, but only about one-fourth of Americans have ancestors who spoke English before arrival in North America. (Indeed, only one of my four grandparents, all of whom were born in the United States, grew up in an English-speaking household.) In other words, General American English is a koine language of second-language learners of English, so it is not surprising that it is spreading all over the world.
P.S. Feel free to visit my user profile here on HN to see more about my background in linguistics and language learning and teaching.
AFTER EDIT: Cantonese versus Mandarin as "dialects" or "languages" were mentioned in other comments. Cantonese is at least as different from Modern Standard Chinese (Mandarin) as German is from English. How you might write the conversation
"Does he know how to speak Mandarin?
"No, he doesn't."
他會說普通話嗎?
他不會。
in Modern Standard Chinese characters contrasts with how you would write
"Does he know how to speak Cantonese?
"No, he doesn't."
佢識唔識講廣東話?
佢唔識。
in the Chinese characters used to write Cantonese. As will readily appear even to readers who don't know Chinese characters, many more words than "Mandarin" and "Cantonese" differ between those sentences in Chinese characters.
[+] [-] mtdewcmu|12 years ago|reply
[+] [-] Dewie|12 years ago|reply
[+] [-] dagw|12 years ago|reply
[+] [-] chromaton|12 years ago|reply
[+] [-] TylerSinSF|12 years ago|reply
Swedish: 0.86 Danish: 0.85 Norwegian: 0.82