top | item 7694436

Rappers, Sorted by Size of Vocabulary

600 points| sinned | 12 years ago |rappers.mdaniels.com.s3-website-us-east-1.amazonaws.com

267 comments

order
[+] loso|12 years ago|reply
I enjoyed reading this chart but I hope it doesn't reinforce the bias that some fans have that word complexity is the only way to tell if a rapper is good or not. There are several ways to judge the strength and weaknesses of a rapper. Complexity is one of them, flow is another. Story telling ability is also another very strong in indicator. The best rappers are able to bring a mix while some are just so strong in one area that they explode no matter if they are really weak in other areas.
[+] unfunco|12 years ago|reply
This is fascinating. I'm only a recent listener of hip-hop (primarily because of Earl Sweatshirt and Odd Future) and I'm in awe of the vernacular.

And similarly, as a boredom exercise a few weeks ago I did some lexical analysis of the song Timber (the monstrosity was being constantly played on the radio at the time) and here's what I came out with:

"83.1% of the words in the lyrics are five letters or less, 58.9% are four letters or less. The lexical density (the number of unique words divided by the total number of words, multiplied by one-hundred) is 29.1%. There is only one word in the song which has three or more syllables. Eleven people were involved with the writing of the song, each of them capable of producing just nine unique words each."

[+] bretthopper|12 years ago|reply
Looked for Canibus near the top and wasn't surprised to find him 4th. If anyone hasn't heard of him, highly suggest listening to his older stuff such as his first Can-I-Bus, 2000 BC and Mic Club.

He raps about science and space all the time which is cool.

Here's an example of his ridiculous lyrics: http://rapgenius.com/Canibus-poet-laureate-infinity-lyrics

[+] shawnz|12 years ago|reply
Additionally, many HN users have probably already heard Canibus rapping even if they don't know it, since he wrote the Office Space theme song. :)
[+] seizethecheese|12 years ago|reply
Many here seem to be interpreting vocabulary size as a signal for quality. When it comes to rap I completely disagree. Firstly, the repetition is rap's main ingredient. I read an article a while ago where researchers found that listening to a spoken phrase that is looped activates the same part of the brain as music, which helps explain this phenomenon.

Personally, if I want food for thought I read. Rap is not an intellectual pursuit. I've been perusing rappers on this list, and the top artists have not been good at all to my ears. It seems that the best rappers are in the middle, and being on either extreme is a negative signal.

[+] Aardwolf|12 years ago|reply
> Shakespeare’s vocabulary: across his entire corpus, he uses 28,829 words, suggesting he knew over 100,000 words

Why does that suggest he knew over 100k words? Maybe it means he knew 28,829 and used all of them? Would he really know over 70,000 words he never used in his works? What would those 70,000 words be? Probably very obscure ones. How can you know that many obscure ones?

[+] mbillie1|12 years ago|reply
Vocabulary has for a while been considered in terms of 'receptive' and 'productive' capacity, with the assumption being that ones 'receptive' vocabulary can be larger, since it is easier to hear/read/understand a word than it is to use it correctly in reading/writing (this is not necessarily the popular opinion anymore [http://www.readingconnect.net/web/FILES/english-language-and...] but may provide the context for the claim about Shakespeare). The notion is that you are able to understand more words than you commonly use in your speech/writing, which is on some level intuitive, although of course it is an empirical question.
[+] MereInterest|12 years ago|reply
I imagine that it would be something similar to the German Tank Problem. (http://en.wikipedia.org/wiki/German_tank_problem ) Taking each writing as a sample of the words that are known then would allow for an estimation of total words known. I imagine that this would need to be modified to account for the non-uniform distribution of word use, but the principle would be the same.
[+] sroerick|12 years ago|reply
The 28K figure is acheived by counting multiple spellings of the same word. Shakespeare lived before dictionaries, so there was never a single standard way to spell a word.
[+] Crito|12 years ago|reply
I'm curious if words that Shakespeare invented count. There are many words that we see first used by Shakespeare, though some of them were probably words invented during his time by others with him merely being the first to record (in documents that survived until today).
[+] jemfinch|12 years ago|reply
Just like you or I know tens of thousands of words, but only use some small subset of them in any given work, you wouldn't expect that Shakespeare would use his entire productive vernacular in producing the limited corpus of his literary works.
[+] nmac|12 years ago|reply
Its a nice touch including portmanteaus and 'incorrect' ebonics on the list (like "ery'day"), since authors like shakespeare, joyce and others took the same liberties with language. Arguably, that's how language develops and makes it interesting to study and think about. The OP could have easily stuck to words in the OED, kudos.
[+] krick|12 years ago|reply
Really interesting, but not as representative as it should be. It's not clear why some have larger vocabulary than others. It could be using words like "zeitgeist" (in case of Aesop Rock) or some clever wordplay (I don't know much about hip-hop, so I can't find example for some artist from the list right off the bat, but I remember Marilyn Manson using word "gloominati" for instance) or pretty meaningless made up words like "schizzle" (in case of Snoop Dogg) or usual derivatives like "fuckedy fuck". Moreover, in many transcripts for hip-hop people write down words as they are pronounced, which can be pretty much distorted for some artists (which of course ideally shouldn't count as a "new word", but that's complicated, yeah).

While Aeson Rock and DMX are clearly extreme and not surprising at all, it's not that clear for some guys in the middle.

So, first off, for every data project sources should be provided, or at least more specific definition, how text was processed, tokenized, analyzed. Second, several more "data slices" should be provided, for instance 100 most used words which are unique for that artist compared to other artist in the list.

[+] duney|12 years ago|reply
The example you used for clever wordplay, "gloominati," is actually considered a portmanteau word. It's the result of combining multiple words to create a new word. (I say this not to be a pedant, but because I learned the term recently and was amused that we actually have a word for it.)
[+] danielsf|12 years ago|reply
OP here. Do I really need to provide all of this to satisfy the reader's ability to grasp the basic premise of the site? this isn't a thesis or academic pursuit, just comparing some rappers for fun.

I used plain NLTK token analysis on rap genius lyrics. in terms of several more data slices...I agree that there should be more cuts of the data, but you must understand the amount of time that it took me to put this together.

[+] coherentpony|12 years ago|reply
Maybe this is just me, but it's a little unfair to compare to literary texts.

Humour me for a moment.

When an artist writes a song, he (or she) has constraints. Most rappers would like to rhyme the ends of their sentences. I know sometimes they don't (like poetry), but it's certainly pleasing to the ear to have that constraint. Artists endeavour to make their songs catchy, that's highly correlated with the gross sales of the product.

When an artist writes a novel, this constraint is not weighted quite as highly. I know Shakespeare wrote poetry, too, and to call me out on this comparison is entirely fair. That said, there's also an argument to be made for eye rhymes. Shakespeare used these a lot. Eye rhymes are words that don't rhyme aurally, but do rhyme visually. It's the story that pleases the reader, not necessarily its aural 'catchiness'. I probably made that word up. But Shakespeare made words up too. The point is, you knew what I meant.

At the end of the day these comparisons, while certainly interesting, should be taken with a pinch of salt. While I'm at it, this advice can easily be extrapolated to any dataset. Always understand there may be unknown correlations.

[+] danielsf|12 years ago|reply
OP here: the shakespeare thing is really just a hook, food for thought rather than an academic/cultural judgement.

I also had several suggestions to use shakespeare's sonnets rather than plays, which I should have done.

and yes, this is all just pinch of salt barbership discussion :)

[+] thinkpad20|12 years ago|reply
Is Del tha Funkee Homosapien on this list? I'd be curious, since he has pretty non-standard lyrics.
[+] habosa|12 years ago|reply
Not surprised to see Wu Tang at the top and Drake at the bottom. Started from the bottom ... still there.
[+] pandler|12 years ago|reply
Haha I was thinking that as you move left on the scale the more likely you are to see rappers that people tend to mock.
[+] orblivion|12 years ago|reply
This looks at the first so many lyrics in each rapper's career. Aesop Rock came out with some weird stuff right off the bat. I wonder if some of these other rappers became more sophisticated over time. Maybe an average per song would be better, or average uniques per word, would be better.
[+] sfrank2147|12 years ago|reply
The problem with average per song is that you "use up" words in every new song, so all things being equal each marginal song has progressively fewer new words.
[+] iLoch|12 years ago|reply
I agree, perhaps the 35,000 most recent words would be better.
[+] randomdrake|12 years ago|reply
For those who aren't familiar with Aesop Rock, I'd invite you to give him a listen sometime. His earlier albums, in particular, have been very influential to me in many ways. Both in my artistic and professional careers.

From comments on the conditions of the working man and the condition of feeling trapped in a "j-o-b"[1]:

   "Now we the American working population
   Hate the fact that eight hours a day
   Is wasted on chasing the dream of someone that isn't us
   And we may not hate our jobs
   But we hate jobs in general
   That don't have to do with fighting our own causes
   We the American working population
   Hate the nine-to-five day-in day-out
   When we'd rather be supporting ourselves
   By being paid to perfect the pastimes
   That we have harbored based solely on the fact
   That it makes us smile if it sounds dope"
To storytelling masterpieces regarding living and dreaming[2]:

   "Look, I've never had a dream in my life
   Because a dream is what you wanna do, but still haven't pursued
   I knew what I wanted and did it till it was done
   So I've been the dream that I wanted to be since day one!"
Aesop Rock takes language and linguistics to entirely different levels than one might expect from the single genre that is hip-hop. He even challenges himself and the listeners, playing fantastic word games, for instance re-using the letters L, S, and D in odd and rhythmical ways after a mention[3]:

   "Lazy summer days
   Like some decrepit landshark dumb luck squad dog lurks sicker deluded
   Last sturdy domino lean's secluded
   Don't let stupid delusions lesson super-duty labor students
   Dragnet lifer solutions
   Daddy loved sloppy dimensions like son-daughter links
   Such determinated lepers, successfully disheveled
   Little soliders developed like serpents despite life sentence ducking
   Lemmings
   Some don't like sobriety's dirty lenses
   Some do"
And then there are just incredible gems that stick with you like[4]:

   "I don't flick neeedles like my sick friend
   I don't march like Beetle Bailey through a quick trend
   I don't frequent church's steeples on my weekend
   And I don't comment if you formulate a weak Zen"
There's a lot to explore from Aesop Rock. Should you find this type of hip-hop interesting, a decent place to start is with the label you can find these songs on, Definitive Jux[5]. Incredible talent has been on and off that label over the years. So much good stuff.

[1] - "9-5ers Anthem" - http://rapgenius.com/Aesop-rock-9-5ers-anthem-lyrics

[2] - "No Regrets" - http://rapgenius.com/Aesop-rock-no-regrets-lyrics

[3] - "The Greatest Pac-Man Victory in History" - http://rapgenius.com/Aesop-rock-the-greatest-pac-man-victory...

[4] - "Save Yourself" - http://rapgenius.com/Aesop-rock-save-yourself-lyrics

[5] - http://en.wikipedia.org/wiki/Definitive_Jux

[+] leorocky|12 years ago|reply
I don't know man, I listened to a couple of the tracks and he definitely has lyrical skills, and I like some of the tracks, but the quotes you selected aren't very good at all, at best obvious topics with all the insight of a million college freshmen. Having said that I like "None Shall Pass" that has a really great sound.

To be entirely honest, I love rap, but not for any insight rappers have in world affairs, but for their lyrical ability. Some are very good at providing unique ways to describe their own insights about their lives but when someone starts rapping about world problems I just want to shut my brain off because it's usually pretty banal. Then with my brain off I can still at least enjoy the way the rap sounds.

[+] WickyNilliams|12 years ago|reply
Aesop is an excellent lyricist. In fact all the MCs on the Rhymesayers label are very talented: Brother Ali, Slug (of Atmosphere) etc.

One MC whose vocabulary always leaves me taken aback is RA Scion, who has been part of the group Common Market. Their song, "My Pathology" [0] is a shining example:

    "Below the terra ferma's the murmur of many men
    Resonatin' the predication of RA's eponym
    It requires a higher degree of thought to transmit
    Elevate above the base and retrace the semantics
    Incommensurately we've been held incommunicado
    From commoner to commodore – they breed bravado
    I exercise authority over the lesser ranks
    We rally and tally up at the shores of the West Bank"

[0] http://lyrics.wikia.com/Common_Market:My_Pathology
[+] Goopplesoft|12 years ago|reply
Interesting comment about the L, S, and D usage and rhyming. I was particularly surprised by the effort that goes into Eminem's rap that I just contributed to "good flow". Some of that effort explained in this video: https://www.youtube.com/watch?v=ooOL4T-BAg0
[+] pla3rhat3r|12 years ago|reply
Easily one of my favorite artists. I'm sad they didn't include more Rhymesayers Artists. I think a lot of them would be to the right of this scale. Guys like P.O.S. and Brother Ali are also very versatile.
[+] seltzered_|12 years ago|reply
Found a video rendition of aesop rock's "no regrets" pretty inspiring: https://vimeo.com/14583499

" 1-2-3, that's the speed of the seed

A-B-C, that's the speed of the need

You can dream a little dream or you can live a little dream

I'd rather live it, cause dreamers always chase but never get it"

[+] Ryanmf|12 years ago|reply
OP: Did your analysis of MF DOOM include his work alongside Madlib as Madvillian or his various other pseudonyms (King Geedorah, Viktor Vaughn, etc.)?

I find it a little hard to believe he's not at least in the Wu Tang/Canibus/KK cluster, if not #1 overall.

[+] Tycho|12 years ago|reply
Yeah I would have though Doom would be very high. But the density of his lyrics perhaps stem more from allusions/references and humour than from the words themselves.
[+] sizzle|12 years ago|reply
I can't take this list seriously until DOOM is at the top, I agree with you guys. Daniel Dumile is on his own level no doubt.
[+] joefkelley|12 years ago|reply
Seriously, DOOM is in his own league. At one point in "All Outta Ale", he rhymes "3-4-methylenedioxymethamphetamine" with "oxyacetaline."

Also, probably my favorite individual rhyme of all time, from "Meat Grinder": "Borderline schizo, sort of fine tits though"

[+] airfoil|12 years ago|reply
Agree. I was surprised not to see DOOM as well. Another MC I think would score pretty highly is Chino XL.
[+] quux|12 years ago|reply
I wonder where Weird Al Yankovic would come in on this ranking.
[+] DigitalSea|12 years ago|reply
Makes me very happy to see Aesop Rock in the number #1 spot. He isn't as underground as many people assume, still relatively unknown in the mainstream, but well known enough to sell records and sell-out shows. I wasn't a big fan of his 2012 release Skelethon, but the way he structures his lyrics and the meaning behind them means he never writes a bad lyric.

Interestingly Eminem whom I would have thought would rank pretty highly for his clever method of word bending and enunciation is only in the middle of the scale. Still a whole lot better than some of his counterparts, but still surprising. Another interesting thing to note is Eminem being grouped in the same league as the likes of Jay-Z, Rakim and Lupe Fiasco. With only a couple of hundred unique words separating them from one another.

[+] xentronium|12 years ago|reply
I always thought eminem was famous for his clever wordplay, not his vocabulary diversity. FWIW, as a non-native speaker I can gather most of his verses. Aesop Rock, on the other hand, is totally indecipherable for me without printed lyrics.
[+] riggins|12 years ago|reply
I find it hilarious that DMX is dead last.

I've now got empirical evidence of what I always thought.

I think DMX rhymes words with themselves more than any rapper I've ever heard.

[+] poink|12 years ago|reply
I'm pretty sure this fails to take into account DMX's rich canine vocabulary.
[+] ziziyO|12 years ago|reply
I think Rick Ross would give DMX a run for his money. I've heard him rhyme a word with the same word before (Atlantic).
[+] ch4s3|12 years ago|reply
I said out loud before clicking the link that DMX would likely be dead last.
[+] ballstothewalls|12 years ago|reply
This is a great graph, but I think it would be neat if a y-axis was thrown in. My first thought was album sales or some other metric of popularity that help you find specific rappers quick instead of going through the huge bunch of little pics.
[+] zopticity|12 years ago|reply
Lil Jon should be at the bottom with 7 words: "Yeah!", "Okay!", "Shots!" and "Turn down for what?"
[+] rthomas6|12 years ago|reply
This infographic doesn't take into account other rappers possibly copying earlier really influential artists, making the earlier influential artists rank lower. More generally, it would be cool to see this chart ranked by the amount of original words present in the first 35,000 lyrics that were not present yet at the albums' time of publication.