Tracking E-Book Downloads from Project Gutenberg

Background on my work tracking e-book downloads from Project Gutenberg. Update: The code is still available, but I have turned off the cron job to live update the data.

In the era of e-readers and digital subscriptions, Project Gutenberg is an online library publishing great e-books for free. Their selection emphasizes classic literature from around the world with expired copyrights. That makes downloading and owning these e-books completely free and legal! (Although, you can chose to support Project Gutenberg here. For disclosure, I have no affiliation or contact with them.)

On their site, they have a page dedicated to reporting data on the number of downloads each of their most popular books and authors get per day. The page includes a link that promises some visualizations of the data. But there are no visualizations at the end of the link. So, I want to offer some of my own.

In 200 lines of Python that can be found on my GitHub (plus some local configuration for cronjob and Chart Studio), I created a pipeline that downloads this page data to a database at least every day. Then, the code automatically generates and uploads visualizations to Chart Studio and embeds them below!

I only started caching the data three days ago at the time of posting. But I configured the charts to show data over a two-week span. So over the next two weeks, these charts will not be as lifeless. They will be reanimated with new data! (At the time of posting, Frankenstein is the most popular book, making what precedes a pun.)

Also, I recommend toggling the outliers on and off by clicking them in the plot legends! Shows some more interesting trends!

One response to “Tracking E-Book Downloads from Project Gutenberg”

Reflections – Ben's Blog says:

November 5, 2024 at 4:26 pm

[…] support, there are a lot of complex tasks Python can handle in one or two lines. Take my Gutenberg download tracking visualizations, for example. With just 200 lines, I am downloading web data, storing in a database, processing […]