Rappers, Sorted by Size of Vocabulary by Matthew Daniels
Literary elites love to rep Shakespeare’s vocabulary: across his entire corpus, he uses 28,829 words, suggesting he knew over 100,000 words and arguably had the largest vocabulary, ever. I decided to compare this data point against the most famous artists in hip hop. I used each artist’s first 35,000 lyrics. That way, prolific artists, such as Jay-Z, could be compared to newer artists, such as Drake.
My inspiration for the project was pretty simple: I had a full database from rapgenius.com, a lyrics site, containing over 65,000 hip hop songs. I had just learned D3, a data visualization framework, and read the Natural Language Toolkit book (the first chapter describes how to calculate the unique words in a corpus). I used a research methodology called token analysis to determine each artist’s vocabulary. Each word is counted once, so pimps, pimp, pimping, and pimpin are four unique words. To avoid issues with apostrophes (e.g., pimpin’ vs. pimpin), they’re removed from the dataset.