Ben’s Blog

The Best Git Repository of All Time

Posted by:

|

On:

|

, ,

In this post, I’m going to proselytize open source software and what I’m calling The Best Git Repository of All Time. I will preview how it’s going to help me deploy a full-stack web application.

The Problem

The reason I’ve been so focused on web-crawling in recent blogs is I have a problem. There’s a ton of websites like from independent musicians posting tour dates or local event calendars from concert venues or free newspapers that I have to check regularly for updates. It’s a pain to manually open all the tabs and cross-reference the event updates with my personal calendar. Such a pain, I just don’t do it sometimes. And I end up missing out on some good concerts or book festivals because of it.

For example, I want to see The Claudettes whenever they come to town. So I have to look up their website. But I can’t remember if the band I want is theclaudettes.com or claudettes.com. I might have to figure it out with trial and error. I got it wrong once, so now my browser takes me to a home decorator site instead of the band I want by default. Once I get to the right website, I have to click “shows.” My browser doesn’t take me there automatically because sometimes I click on “news” or “music.” (I’m waiting for the band to drop a new album, so I have to check on that, too).

I could solve this problem by saving https://theclauttes.com/shows to a browser bookmark folder with my other sites. But then I’d still have to sift through every tab I open up for new information!

The Solution

So I thought, why not solve this last hurdle with a web application? Here’s the basic goal for the app. You upload the URLs with the public information you want to bookmark. Then, the application notifies you via your email or online dashboard when your sites make updates. That way, you only have to check one page to see if you have any updates. And since it’s a web application, it could open your bookmarks for you. But only the ones that have new content, if you want.

You could use this technology to stay current on your favorite indie bands’ schedules or keep quick tabs on your favorite small blogs.

The Opportunity

To do this for myself would be pretty simple. I could keep a list of URLs hardcoded in a .csv file. Microsoft Excel even lets you run Python in macros now, so I could implement the page-fetching and comparison features right in a spreadsheet macro. But the point is supposed to be convenience, making your workflow less painful, minimizing the number of apps and links you have to follow and buttons you have to click to get all your updates.

So I started thinking about doing this as a web application, which is a perfect opportunity considering I want to write about full-stack web development on my blog. Previously, I’ve deployed a React page into the cloud, and I’ve set up a static database and cloud-based RESTful API with Python to serve my React app data. But there’s so many more aspects to full-stack development I’ve been meaning to discuss: Database design, ORM, scalability, codestyle, API design, authentication, security, deployment, and more. These would all be concerns in the best version of the software I can imagine.

The Best Git Repository of All Time

The more I write code, the more I realize good software isn’t about writing code. It’s about solving problems and bringing the right technologies together to accomplish your goal. For my project, I need: A database to store URLs and cached site contents & a flexible web UI to let me enter new web data and display tables of update statuses. In short, I need a front end and a backend. Since these requirements are so basic, I can look for existing projects that implement them well and use them as templates. Thinking ahead, I’m also going to want network usage reporting and a web proxy to keep my application safe on the WWW.

It might sound like plagiarism, copying different projects that have already done this work. But no! As long as you attribute credit properly, software that is open source is free to use, copy, distribute, and sometimes sell however you see fit. Having a healthy open source community accelerates software development since it lets us focus on the unique problems we are solving rather than wrestling with the technology need to accomplish our goals. That’s why open source templates like the best git repository of all time are crucial for smaller developers.

This template has literally all the technology I need for my goals, and the code style here is perfect! The Python is typed. There are clear patterns for database operations. The development is containerized, so it works on any development computer and is ready for deployment to the cloud. The React components have organized folder structures and well-typed dependency injections. In practical terms, in just two hours, I was able to add new database models for my URLs and cached website data PLUS new UI components to let users add the data they want.

Future Development

In summary, I want a project that solves my web crawling problem, educates about full stack development, and is production ready. It doesn’t have to be the most useful tool in the world, but just something fun to build and focus on.

There are other opportunities to improve the idea beyond what I’ve already proposed. Whether a page has substantive updates can be a tough call to make, maybe something that can be measured with the help of AI models. Also, expanding the feature to allow people to get information from sites that require accounts and passwords might be worth looking into, but I’d rather not raise those security concerns yet.

Keep a “tab” on my progress in my repository here, called mini-TAUR (TAUR for Tabular Automatic Updates Report): https://github.com/wright-benjamin-1701/mini-taur-bens-blog.

One response to “The Best Git Repository of All Time”