Ben’s Blog

Is Jazz Dead?

Posted by:

|

On:

|

, ,

Background

Recently, I attended a concert by Marcos Valle put on by the record label Jazz Is Dead. It reminded me to make good on my promise to upgrade my jazz data projects by extracting new data from the wiki pages. The main obstacle to doing this at the time I extracted the data was a problem of scale. And I don’t mean the musical kind. With over 50,000 pages to parse, searching through each page for specific information – such as genre of music or years an artist was active – was taking about 30 minutes for every data feature I wanted to extract and test. Moreover, my dataset is about 10GB, so running out of RAM during processing and cleaning the data was a challenge.

Scaling Blues

In programming, we have common strategies to address problems of scale. In my case, the solution was an open-source library called Dask. If you know the Python data library Pandas, then Dask will be quick to pick up. Data can be manipulated in Dask with DataFrames like in Pandas, although there are a couple differences to watch out for. First, Dask does not load data into RAM unless it has to. This is called “lazy loading.” Since I only needed one row of data at a time, Dask only loaded one row at a time. Second, Dask has built-in parallel processing abilities. Because my CPU has 32 threads, instead of processing 1 row of data at once, Dask did 32. What took 30 minutes before now takes just about 1.

That may sound like magic – speeding up a program by a factor of 30. But it is unfortunately not. Not all tasks can be parallelized. Think of programming threads like a highway. Adding more lanes can free up traffic. But no matter how clear the roads are, you still have to get from where you start to where you’re going. There’s no magic shortcut.

Cleaning and Visualizing the Data

So, with the newfound ability to extract data from my jazz database, I wanted to know – is jazz dead? To figure it out, I decided to look at the number of active jazz artists in a given year over time. For each individual jazz artist, their Wikipedia page shows the years they were active. (Check it out!) So, all I had to do was add up every year from around 1900-2024 on every one of around 2500 artist pages! That’s over 340,000 data points, but since it can be done in parallel, it only took a few minutes! And check out the result:

jazz-is-dying

Analysis & Future Direction

So, the graph is a simple single line graph. But lots of the excitement of this article is about the parallel processing it took to get to it. Keep in mind these numbers of active artists are just ones with Wikipedia pages I found crawling the web. There could be 10x more jazz artists out there. But assuming popularity of an artist correlates with the prominence of their Wikipedia article, the trend is likely accurate. However, another confounding problem is that contemporary musicians may be active but not influential enough yet to have articles written about them. There may be a delay or a lag that biases the data against contemporary artists. In the future, we can investigate this by studying when an article was written versus the active years of an artist for artists active after the beginning of Wikipedia.*

So, saving that analysis for another day, we can see jazz peaked in the early 2000s, and we can obverse it is on the downswing according to this data. But is it dead? I would say: Listen and see for yourself. Most jazz happens on the downbeat.

*Update and spoiler alert: The mean delay in an article being released for a new artist since 2001 is 5 years, and the median is 4 years. Since the downward trend has been going on for more than 5 years, it is reasonable to discount this effect.

2 responses to “Is Jazz Dead?”