top | item 3827033

How to understand half of Harry Potter book in any language (+ source code)

35 points| legierski | 14 years ago |blog.self.li | reply

46 comments

order
[+] pooriaazimi|14 years ago|reply
> ... as I’ve already read all 7 books in 2 languages before - I could pick up a lot just from context...

That's what I've always said to my friends, but they insist on learning grammar, which of course they'll forget after a few hours/days. Three years ago I could hardly read anything non-technical in English, now I can understand Lolita - which is a hard novel to read for a non-native. All thanks to audiobooks.

I found that audiobooks are the most fantastic way of learning a new language... I couldn't have possibly read Lolita or Silmarillion before - They're just too hard for someone who is trying to learn a new language. Long sentences full of new/invented words - It's easy to lose the thread. But listening to a skillful reader reading them aloud, you can understand even the most complicated words and sentences just from the context, and the reader's tone and emphasize...

If you want to learn a new language, do yourself a favor and listen to some audiobooks. Pick a book you've already read in your native language (preferably more than once) and you'll be amazed how easy it is to understand and learn new words (you must have a basic understanding of that language of course).

[+] krelian|14 years ago|reply
Audiobooks are great for learning the spoken part of a language but I would guess that most of us "technical people" need and want to learn things from the bottom up. That's the only way we feel that we really know something.

It's true that grammar can be very complicated and will be easily forgotten without practice. A good practice is reading a book and trying to see not only if you can understand what is happening but also analyze each sentence and see if you can understand which grammar rules are used in its construction and why.

[+] ajuc|14 years ago|reply
I've much improved my English because I've wanted to read Harry Turtledove sci-fi novels (about UFO invasion during WW2 :) ). Only few first novels (2 IIRC) were translated to Polish. I started reading next novels in English, at first it was hard, but I just ignored unknown words, and guessed what they mean from the context (often it became clear what some word had to mean a few paragraphs down, or from the context of the next use of this word). It was much faster that way and easier to remember typical use of words, than if I would check dictionary every time I see a new word.

I understood "computer English" before just good enough to read documentation, tutorials, and play most games :), but I've had problems with complex sentences, idioms, etc. Now it's much better, and I'm reading a lot of English literature.

[+] klbarry|14 years ago|reply
That is amazing advice that I will take up later this year. Thank you.
[+] hkolek|14 years ago|reply
I like the approach but I think it's a big mistake to not strip stop words. He should focus on nouns and verbs imo. The top words he lists are all stopwords/grammatical particles "de", "que", "la", "y" etc. I don't think knowing those words will help to understand anything. I think if you understand only the grammatical particles in a sentence it won't help at all to understand the meaning of the sentence. On the other hand if you know the verbs and nouns but not the grammatical particles you can at least infer some meaning or what it's about.
[+] legierski|14 years ago|reply
You can always print out, let's say, 50 top words and not bother with "de", "que", etc.
[+] aohtsab|14 years ago|reply
When I was starting out in German, I started listening to the audiobooks and reading the German text. Knowing the small words (prepositions and other stopwords) isn't of much use on its own, but having a general feel for the plot and knowing the context of what 'should be happening' in the book helped give me a much broader understanding (and richer vocabulary!) than merely ploughing through the German 101 textbook.

tl;dr: Applaud the idea, but it's misguided. You need to focus on understanding words in greater context to derive any meaning.

[+] acslater00|14 years ago|reply
TL;DR Nearly half of the word occurrences in Harry Potter are prepositions, so if you learn a small number of them you can claim that you "understand half of Harry Potter". For example, you can absorb sparking dialogue like the following:

"Harry and to at to I to with Voldemort or what to and I do for, Hermoine!!"

[+] frooxie|14 years ago|reply
David Moser, in his text "Why Chinese Is So Damn Hard", points out that

Even though you may know 95% of the [words] in a given text, the remaining 5% are often the very [words] that are crucial for understanding the main point of the text. A non-native speaker of English reading an article with the headline "JACUZZIS FOUND EFFECTIVE IN TREATING PHLEBITIS" is not going to get very far if they don't know the words "jacuzzi" or "phlebitis".

[+] jenius|14 years ago|reply
Yeah really, this is absolutely dumb. How did this get to the front page of hacker news? Knowing those words will literally get you nowhere if you are trying to learn spanish.

This is like trying to learn a javascript and finding that in reality it's not that hard because half the language is open and close parens, semicolons, and the word "function". Fluent in javascript!

[+] kiiski|14 years ago|reply
One should also remember that not all languages use so many prepositions. In Finnish, for example, we usually change the ending of a word instead of using prepositions.

For example, the previous paragraph in Finnish: "Pitää myös muistaa, että kaikissa kielissä ei käytetä yhtä paljon prepositioita. Suomessa, esimerkiksi, muutamme yleensä sanan loppua preposition käyttämisen sijasta"

[+] mseebach|14 years ago|reply
So, the tl;dr is that this guy discovered that using a dictionary is a good way of learning words in a different language.

In more detail, there's an assertion and a proposed solution - and nothing to even remotely back up the assertion? Show me a page of Harry Potter in Spanish translated in this manner - I somewhat doubt it will make much sense.

[+] ajuc|14 years ago|reply
That method of learning foreign language was adviced by David Snopek. American that learned Polish that way in 1 year (and speaks fluently - this is very hard for non-native Polish speakers, most people have problems with producing correct sentences at all after one year).

His post about this: http://www.linguatrek.com/blog/2010/12/harry-potter-the-book...

You can guess surprisingly big part of a book basing just on the context and a few words you already know. That's how kids learn their native language. It's a good method, it worked for many people (including me:)) and I don't understand why are you arguing it isn't.

[+] pm215|14 years ago|reply
There's some similar statistics for Japanese novels here: http://pomax.nihongoresources.com/index.php?entry=1223045359 which I think show that the problem is not at the "most common" end of the distribution but at the "least common" end. The jump between '80% understanding' and '90% understanding' requires knowing an extra 5341 words, 90% to 95% needs another 7495, and so on. Basically the long tail is really nasty, and even 90% understanding is still not knowing one word in ten...
[+] cskau|14 years ago|reply
Thank you for sharing this. I'm doing something related at the moment, so it's nice to see previous work like this.
[+] korussian|14 years ago|reply
That's a fantastic idea. I'm struggling to learn Korean, and it's tough because few of the words are recognizable to me. I have a base of English/French/Russian, so that doesn't help (much).

I would love to try to put my grammar/flash cards aside and go the Harry Potter route.

I can't code. What could you do to help me, an average user, do this with Korean?

[+] cgag|14 years ago|reply
I don't think being able to code or do any sort of analysis is particularly valuable. When you read a 50,000+ word book, you're going to see the top 50 words enough times to get them down pretty well. I know I've picked up a lot of words like wand and owl from my copy of la piedra filosofal.

I think just go get copies of books you know well, or childrens books (harry potter, the little prince) and start reading with a dictionary or wordreference near by.

You could also pick up a frequency dictionary if you just want to build a base level of vocabulary.

[+] neop|14 years ago|reply
I'm also learning Korean and I'm looking for a book to try and read. Unfortunately I haven't found any suggestions. From what I've heard children's books are actually harder to read because they use a lot of words that only children use. Harry Potter might be a good idea, but it seems to me like it would have a lot of fantasy related vocabulary which won't be very useful. A story set in modern day real world would probably be better.
[+] TeMPOraL|14 years ago|reply
> I can't code.

I guess there are many people who would benefit from this idea and, like you, can't code. It sounds like a good idea for an Internet service - a list of words to check before reading book X in language Y.

[+] legierski|14 years ago|reply
I don't know much about Korean, but if I were you I would just get a book and start reading it, no matter what. And check out somewhere online the most popular words in this language.
[+] protractor|14 years ago|reply
To plug a friend of mines startup: www.talktomeinkorean.com
[+] krelian|14 years ago|reply
>did you know that out of 5 most popular languages in the world, 3 of them are relatively easy to acquire? They are: English, Spanish and Russian, and my plan is to be fluent in English and Spanish and be able to get by with Russian by the end of 2013! Who’s with me?!)

I'll grant that Spanish is relatively easy but Russian is considered one of the most difficult languages to learn. It's hard for me to judge the difficulty of English but I wouldn't say it is an easy language.

[+] apendleton|14 years ago|reply
Russian is probably somewhere in between. How hard a language is to acquire depends a lot on what language or languages you already speak, but for native English speakers, Russian is considered difficult, but probably not "one of the most difficult." The Defense Language Institute, which provides instructions for native-English-speaking US military translators, classifies Russian as a category 3 language (of four categories), which makes it harder than French or Spanish (category 1), or German (category 2), and on the same level as Farsi or Hindi, but easier than the a reasonably-sized swath of category 4 languages that includes Mandarin, Japanese, and Arabic.
[+] goblin89|14 years ago|reply
I like HP series in this regard. Vocabulary complexity slightly increases with each book, which helps to progressively learn a language.
[+] nathell|14 years ago|reply
Reinventing Zipf's law, huh?
[+] legierski|14 years ago|reply
Oh, never heard about it, thanks for the tip!
[+] rivalis|14 years ago|reply
Nlp folks call those "stopwords," because they don't contribute much to statistical understanding of text. That is, in most nlp applications, those words are removed to leave more meaningful text behind. How did this make front page?
[+] wrs|14 years ago|reply
Linguality (http://www.linguality.com/) prints French and Italian novels with the original text on the right pages and a page-specific mini-dictionary on the left pages. No need to keep stopping to look up words in a dictionary.

Unfortunately there are only a few Linguality books. Can you do this with an e-reader?

[+] tolliator|14 years ago|reply
As a native russian speaker, I can say with utmost certainty that Russian is NOT the easiest language to learn. In fact, I would argue that it is somewhere on the upper scale of difficulty.

I have been living in North America only 10 years - and I can't even teach Russian to my own kids - we had to get a tutor.

[+] raphman|14 years ago|reply
Nice. Additional stemming would probably provide better data, however.