top | item 2490336

Google labs:word frequency in books over the last 200 years

23 points| prat | 15 years ago |ngrams.googlelabs.com

I was surprised to see the high popularity of the word "fuck" prior to 1820

21 comments

order
[+] EliAndrewC|15 years ago|reply
The example in the OP (fuck) was so common until the early 1800s because of the typographic convention to substitute an f for an s. In other words, the word "suck" was being written as "fuck", which is why the word appeared so often until the early 1800s.
[+] orls|15 years ago|reply
You can see the changeover quite clearly by comparing the two against each other: http://ngrams.googlelabs.com/graph?content=fuck%2C+suck&...

If we assume all pre-1800ish mentions of 'fuck' are definitely meant to be 'suck', it still features much more prominently in the corpus beforehand than after.

Any ideas why that might be? E.g. certain types of text that were more common before that era, or other (less, er, 'suck'y) types of text that came after, 'diluting' the corpus?

[+] tseabrooks|15 years ago|reply
Any background on the origin of this typesetting convention? I'd like to know the whys and whatfors...
[+] jcr|15 years ago|reply
If you change the bounds to include the 1700's, the prevalence of the term is more pronounced (if you pardon the pun).
[+] Groxx|15 years ago|reply
Utterly awesome. http://ngrams.googlelabs.com/graph?content=My+name+is+Inigo+...

Potentially even more awesome is that they have the entire dataset available for download o_O

edit: case sensitivity is more fun than insensitivity: http://ngrams.googlelabs.com/graph?content=Star+Trek%2Cstar+... vs http://ngrams.googlelabs.com/graph?content=star+trek%2CStar+...

edit2: there are a whole bunch of geek-term bumps around and just after 1900. Anyone know why? E.g.: http://ngrams.googlelabs.com/graph?content=Star+Wars&yea...

[+] splat|15 years ago|reply
I have no idea, but my guess is that they don't know the dates for some books and the system automatically classifies the publication date as "1900" or "1901." If you search the word "quark," you also get a bump at around 1900 even though the word wasn't coined until Joyce's Finnegans Wake in 1939.
[+] PetrolMan|15 years ago|reply
I find it kind of interesting that a lot of words peak around the middle of the 19th century and have been in decline ever since. I'm guessing this has something to do with the increasing number of books published but it is still kind of hard for me to imagine that "the" is less commonly used now than one hundred years ago. The pattern holds true for a lot of common words...
[+] dlsspy|15 years ago|reply
I'm going to have an impact on google's internet bill this month.