This essay documents the design thinking and process behind a data visualisation I created recently for the Tableau Iron Viz feeder competition. The visualisation, entitled “Say What? A Brief History of Profanity in Hip Hop” explores the use of explicit language in hip hop music since 1985. I encourage you to view the visualisation here first, before returning to this post to understand the story behind the viz.
This viz was built for the second Iron Viz feeder of the year, the theme being music. The feeder theme was announced whilst I was attending the European Tableau Conference in Berlin, back in June. From the moment I learnt of the feeder theme, I knew I wanted to do something centered around hip hop. The question was what exactly. I have done various hip hop-themed vizzes before (mainly focused on Kanye West) but I wanted to try something different this time.
Whilst undertaking research for this viz I came across an article published by Best Tickets in 2014 which explored the use of profanity in rap lyrics from 1985 to 2014. While the visualisations included in this article were so so, they did provide the full data set used in their analysis for download and a few ideas to get me started. I knew immediately that this would form the basis of my viz.
Before looking at the data in Tableau I jotted some ideas on Post-It notes:
I drew a section of my viz as a quick prototype:
I also documented the following:
Goal: Convey the use of explicit language over time. Attempt to add context and explain the why. Compare sub-genres / genders / locations to determine if there are any trends. Is the use of explicit language increasing or decreasing?
Style: Monochrome colour palette with one additional colour (red) to help make the design pop. Rough and raw style with elements of graffiti and/or notepad scribbles.
Source Data: Best Tickets data, additional lyrics data, Wikipedia for artist information, Spotify integration to play songs.
Structure: Scrollytelling style with interactive elements. Partly explanatory but with options for the reader to explore at their own pace too.
I was fortunate that the Best Tickets data set had the majority of data that I required already collected. To ensure it was trustworthy I manually counted the number of swear words in a few tracks using a lyrics website and cross-referenced these with the swear word counts in the data set. Thankfully they matched. However, the data set wasn’t completely error-free. More on that later….
I wanted to update the data set to include additional albums released in 2015 through to 2018 (I chose to omit 2019 as it would have been an incomplete year). In order to do this I first needed to identify the top-performing or most influential albums released in the 2015 – 2018 period. I did this using a combination of Wikipedia, Billboard and my own knowledge. Using this method I was able to select five albums from each of the four missing years and source the track listing for each album. Later I went back through the Best Tickets data set and swapped out a few albums which I disagreed with using a similar approach (for example, I added Lil Kim’s 1996 debut album “Hard Core” which I believed to be a significant release).
Next I used a combination of LyricWiki and Genius to manually add the explicit word counts for the new albums. There was nothing about this stage; I literally used Ctrl + F to count how many times each word appeared in each song.
To supplement the word count data I sourced some additional information about the artists. For example, their hometown, their usual genre (gangsta rap, conscious rap, etc), their gender and whether they were a solo artist or part of a group. I felt this would help add an additional layer to my analysis and may uncover a trend which Best Tickets had missed. All of this information was sourced from Wikipedia where I didn’t already know it.
Finally, I wanted to integrate Spotify into my dashboard to give readers the ability to listen to selected tracks. This was inspired by an older viz I remember seeing by Pooja Gandhi here. I initially downloaded Pooja’s viz to understand what she had done. At this point I realised I would require the Spotify URI (essentially the URL) for every song in my data set. Now, with in excess of 2,500 tracks I wasn’t going to source these one-by-one!
So….. what were my options?
It was at this point I recalled seeing this tweet from Skyler Johnson:
Skyler works at Spotify in New York and had kindly offered to help the community with the Spotify API. I wasn’t sure if the API would help me source the URI’s so I reached out to Skyler for some advice. Skyler responded and suggested I try using Python and directed me to his GitHub…..
This scared me a little. I had never used Python before and didn’t know where to begin.
I decided to approach a colleague for advice. I knew my colleague was skilled in Python and thankfully they were able to get Python installed on my laptop and talk me though exactly what I would need to do to run Skyler’s code. It was actually a lot easier than I had imagined. Firstly, I had to create a Spotify Developer Account which was straightforward enough. I then updated Skyler’s code with my unique Spotify keys. Next I created input and output csv files for my track data; the input containing the artist and track names and the output being a blank file that would be populated with the URI links.
I ran the code and hoped for the best.
To my surprise it worked!
My output file was populated with URI’s where Spotify could identify the tracks using the information I had provided. However, where the track could not be matched, it returned as “not found” instead. My next task was to go back through the “not found” list and attempt to fix any errors where the song name or punctuation format was causing Spotify to miss the track.
I quickly realised that some of the albums I was looking for were not on Spotify. Many of the Jay-Z albums in my data set were missing (most likely due to Jay-Z’s preference for this own music streaming service, Tidal). I also found a multitude of spelling or punctuation mistakes that were causing Spotify to miss the tracks (remember how I thought my data set was clean?!). Using Spotify as my source of truth I was able to go back and update my data where necessary. When I was ready I ran the Python code again to obtain as many URI’s as possible.
For this project I did A LOT of reading on the history of hip hop and the evolution of the genre over time. I admit I was captivated by the story of DJ Kool Herc and the birth of hip hop in the 1970’s. Likewise, the stories of Run DMC and N.W.A. had me hooked. While these stories were important and go to some lengths to explain how explicit language found its way into the genre, they weren’t a core part of my analysis. For this reason I decided to include a timeline which utilised collapsible containers in Tableau to include key historical information without taking up too much space.
Each decade included a pop-up container containing key information about the era and the sound of hip hop at that time:
In hindsight I wish I would have designed the viz differently so the collapsible containers didn’t obscure any other parts of the viz.
The remainder of the viz is constructed in a scrollytelling-style with supporting text to explain the charts and tell the story. The majority of the charts are titled with questions which are answered either in the chart itself or in the accompanying text.
I kept to basic, easy to interpret charts; bars, lines, scatter plots and a simple map. I wanted my viz to have a minimal style without any unnecessary clutter that would be simple enough for a non-data person to understand.
At the bottom of my viz I included an exploratory section where readers can explore the data set themselves, learn more about the artists and their choice of language and also explore the use of words over time. I was keen for this section to have a different feel to the rest of the viz. This was partly inspired by one of my favourite vizzes by Chantilly Jaggernauth. In this example, Chantilly uses a contrasting background colour for the lower section of the viz. This creates a striking effect, naturally drawing the readers’ attention to the lower section.
In this section of the viz I used parameter actions to create an image swap, display key artist information and also to adjust the size and colour of the bubbles highlighted in the dot plot. When a bubble is selected in the chart, the image, text and Spotify preview to the left update accordingly while the chart updates to highlight other appearances of the selected word over time. I also enabled ‘Allow Selection By Category’ in the tooltips so readers can easily find other songs by the same artist or other artists who are aligned to the same genre.
I created each of the images manually in PowerPoint using a blank Polaroid-style frame I found online. I sourced pictures of each of the artists and adjusted the colours to black and white (where they weren’t so already). Where possible I tried to find more natural images of the artists as opposed to those clearly taken in a studio or mid-performance.
I added the photos to the Polaroid frames and used a handwriting-style font to add their names to the bottom of the images. I wanted these to feel authentic so I wasn’t too careful with the placement of the text and I rotated the images differently every time so collectively they would appear more random:
Whilst working on my viz I thought it would be interesting to gather some additional data to compare swear word usage with the Billboard success of each album. I was curious to know if the more explicit albums performed better in the charts than cleaner alternatives.
To gather this data I used Wikipedia and manually noted the highest US Billboard position for each album. Whilst doing this I also wrote a few paragraphs about each album for context but I didn’t end up using this information in my viz. I felt that the Billboard performance data was more relevant for older albums (when fans still bought physical CD’s or cassettes rather than purchasing downloads or using steaming services). For this reason (and to save time) I only gathered Billboard data up until 1998. It’s worth noting that during the late 1980’s and 1990’s, the more controversial acts such as N.W.A. were banned from being played on some radio stations so the fact that their music performed well in the charts emphases just how popular they were.
Sure enough, the explicit language certainly didn’t hinder album sales. In fact, many of the more explicit albums reached the Top 10 (at least) of the Billboard:
Aside from those already mentioned, my design inspirations for this viz were influenced by two Hip Hop-related articles by The Pudding, namely “The Language of Hip Hop” and “The Largest Vocabulary in Hip Hop“. I urge you to check these out if you haven’t already.
The inspiration for the header design (designed from scratch in PowerPoint) comes from a Kanye West infographic I found on Pinterest:
Finally, my inspiration for the format of this post comes from RJ Andrews and his “Neil and Buzz Design Essay“. This is a fantastic piece of data storytelling and I loved how he structured his accompanying blog post.
Before I conclude I would like to thank Skyler Johnson for his help with the Spotify API, my colleagues at Slalom for help with Python and for helping me to develop my viz idea and also to Kevin Flerlage for helping to inspire my work and for his constant support.
There were a total of 101 entries into this Iron Viz feeder; an Iron Viz record! I encourage you to browse through the other entries here. Each one is unique and inspiring in it’s own way.
Here is a static image of my completed viz. Click on the image to explore the interactive version on Tableau Public:
I hope you enjoyed this post.
Thanks for reading.