Some weeks ago I was listening to Depeche mode – Enjoy the silence. At the same time, i had the thought: “In the good old 80’s there was far less one-hit-wonders. Back then, artists must have survived longer on the hit lists than they do today”. But I didn’t know for sure.
At the same time, I was taking an course at Aarhus University called Advanced market research. In this particular course i got familiar with the Kaplan-meyer non-parametric survival analysis. This is a very simple but powerful model that creates a survival-function, based how long each subject is present in a time series.
This formed the following research-question:
Is there a difference between artist-survival on the worldwide top100 hitlist in the 80’s and in the 00’s?
Tools at hand.
My #1 goto tool, R, actually has a package called survival that enables you to do such analysis on your own. You can plot the function, as well as get a summary showing the probabilities of being alive at time t. If you’re interested, have a look at Ani Katchova’s tutorial for doing this in R.
Another tool im very happy with is the free import.io crawler/extractor. This tool can be used to extract data from almost every public website you could want it from. I wrote a loop that generated a list of URL’s and then bulk-extracted the data from the billboard hot100 hitlist from 1980-2016.
As this analysis was carried out in June 2016, this only allows me to observe the first 6 and a half years of the two decades, if they are both to have an observation period of 10 years. To be more specific, each year is treated as a group of artists first observed in that particular year, and these are then observed for the next 10 years.
The artists that appear on the hot100 hit list between 1980-1981, will be followed in the following 10 years (1981-1991). In order to ensure that the first period will only contain artists from with their first appearance in this particular period, a calibration period of the 5 preceding years are used for both the 80’s sample, as well as the 00’s.
Each artist is assumed to “die” the last time they are observed on the hitlist in the 10-year period after the year they entered the list. I fully understand that this could potentially bias the result, as some artists might have a comeback 15-20 years after their first appearance on the hot100. However, I decided to proceed and make do with what data that was available.
In order to compare the results, I first look at each of the groups within its own decade, and then aggregate the respective decades into two groups (80’s & 00’s), and compare them using a plot, as well as the exact values from the model summary.
- Each year is treated as a sample of artists that are first observed in this year
- Only artists from the first 6,5 years of each decade are sampled
- Arists die when they are last observed on the hitlist
- Potential bias from artists with a comeback later than 10 years after
Data and modelling
As described earlier each of the two decades only goes to 6,5 years in each of them, in order to make a 1:1 comparison between the groups.
The 80’s sample has 856 weeks from the hot100 hitlist, and has different calibration periods (1980-81, 1981-82, …) and a following observation period of +10 years from the end of the calibration period, ultimately ending in 25/05/1996.
Where in comparison the 00’s sample has a total of 857 weeks with the last observation at the 28/05/2016. The 80’s group has a total of 85.455 entries and 2116 unique artists, where the 00’s has 85.700 and 2670 unique artists.
Due to the nature of the research question (comparing two groups, with no IV’s), the kaplan-meyer model seemed sufficient. However, it is worth mentioning that the Cox regression model is also available in the R survival package.
- Every year has a 5-year calibration period, filtering out artists from the prior 5 years
- Every year is observed for 10 years (1980-1990, 1981-1991, .., 2000-2010, 2001-2011)
- Sample periods end: 25/05/1996 and 28/05/2016.
Survival function: 1980’s
As seen above (click to enlarge), the curves have a very step decrease at first, whereafter it seems to stabilize over time. If we look further in the table below, we see that around 50% of the artists has left the hit list after only 16 weeks.
Also, by looking at the table below, we see that the in-sample groups are fairly even distributed. Or in other terms; the amount of new artists entering the hit list per year seems to be varying between 71-96.
Survival function: 2000’s
As seen above, the drop is a little more dramatical in the 2000-2006 sample. at some point 15% of the artists are leaving the hitlist for good at the same time.
Looking at the table below, we can further examine the decrease, and it seems as the decrease is happening after the 19th week a total of 119 artists leave the hit list (n.event), causing the 15% drop.
To draw a parallel to the 80’s sample, about half (49%) of the artists have left the hit list after 18 weeks.
From the table above, it seems like the amount of new artists across the years are somehow increasing per year, going from 97-139 a year.
The comparison: 1980’s vs. 2000’s
The final comparison still has the same assumptions as the two preceding, as well as the same sampling periods as stated earlier. In this final model, the 1980’s sample has a total of only 590 artists, against the 2000’s sample with a total of 805.
Finally, it seems like my initial assumption was correct for these particular samples: The artists in the 80’s actually had a higher survival rate than the ones in the 00’s. But remember that this analysis only takes in the artists from the first 6,5/10 years in each of the groups, as well as this final model is an aggregated one.
Looking at their first year (52 weeks):
- 83,7% of the artists in the 00’s sample had disappeared
- 67,5% of the artists in the 80’s sample had disappeared
In other words, only 16,3% of the artists survived more than a year in the 00’s, whereas 32,5% of the artists in the 80’s survived more than a year.
The highest lifetimes were:
- Thompson Twins with 359 weeks in the 80’s sample.
- Nelly Furtado with 343 weeks in the 00’s sample.
A friend told me: After the invention of the internet, the competition in the music industry has intensified. This is partially due to an increased supply, since the internet has made it easier and cheaper to release and promote music.
Well, that was it! Due to the length (~500 lines) of the R source code it would not be ideal to include the sourcecode in this blog post, so i will put it in a coming code-section on my website.
Hope you enjoyed reading.