For the previous version of this visualization, see my old blog post here. I wanted to both improve the quality of the code and the visualization. See this post for more details. In short, this is a visualization of Wikipedia articles on jazz albums clustered via TF-IDF and k-means techniques. Colors represent k-means clusters in the 3000-dimensional TF-IDF feature space, and the size of a datapoint represents how often it was referenced by over 100,000 articles on jazz artists, songs, genres or other albums.
The Python source for this project is available here.
One response to “The New Jazz Album Data Visualization (Portfolio Edition)”
[…] reason I’ve been so focused on web-crawling in recent blogs is I have a problem. There’s a ton of websites like from independent musicians posting tour […]