Python script to monitor website changes


In today's digital age, staying up to date with the latest changes on a website is crucial for various purposes, such as tracking updates on a competitor's site, monitoring product availability, or staying informed about important information. Manually checking websites for changes can be time-consuming and inefficient. That's where automation comes in.

In this blog post, we will explore how to create a Python script to monitor website changes. By leveraging the power of Python and some handy libraries, we can automate the process of retrieving website content, comparing it with previous versions, and notifying us of any changes. This allows us to stay proactive and react promptly to updates or modifications on the websites we monitor.

Setting up the Environment

Before we begin writing the script to monitor website changes, we need to set up our Python environment and install the necessary libraries. Follow the steps below to get started −

  • Install Python  If you haven't already, download and install Python on your system. You can visit the official Python website (https://www.python.org/) and download the latest version compatible with your operating system. Make sure to select the option to add Python to your system's PATH during the installation process.

  • Create a New Python Virtual Environment (optional)  It's recommended to create a virtual environment for this project to keep the dependencies isolated. Open a terminal or command prompt, navigate to your desired project directory, and run the following command:

python -m venv website-monitor-env

This will create a new virtual environment named "website-monitor-env" in your project directory.

  • Activate the Virtual Environment  Activate the virtual environment by running the appropriate command based on your operating system:

For Windows 

website-monitor-env\Scripts\activate.bat

For macOS/Linux 

source website-monitor-env/bin/activate

You should see the virtual environment name in your command prompt or terminal, indicating that you're working within the virtual environment.

  • Install Required Libraries  With the virtual environment activated, let's install the necessary libraries. In your terminal or command prompt, run the following command:

pip install requests beautifulsoup4
  • The "requests" library will help us retrieve website content, while "beautifulsoup4" will assist in parsing HTML.

With the Python environment set up and the required libraries installed, we are ready to start building our website change monitoring script. In the next section, we will cover the process of retrieving website content using the "requests" library.

Retrieving Website Content

To monitor website changes, we need to retrieve the current content of the website and compare it with the previously saved version. In this section, we'll use the "requests" library to fetch the website content. Follow the steps below:

  • Import the necessary modules  Open your Python script and start by importing the required modules 

import requests
from bs4 import BeautifulSoup

The "requests" module will handle the HTTP requests, while the "BeautifulSoup" class from the "bs4" module will help us parse the HTML content.

  • Specify the website URL  Determine the URL of the website you want to monitor. For example, let's use the URL "https://example.com" for demonstration purposes. Replace it with the actual URL of the website you intend to monitor.

url = "https://example.com"
  • Send a GET request and retrieve the content  Use the "requests.get()" method to send a GET request to the website URL and retrieve the content. Assign the response to a variable for further processing.

response = requests.get(url)
  • Check the response status  It's a good practice to check the status of the response to ensure the request was successful. We'll use the "response.status_code" attribute, which should return a status code of 200 for a successful request.

if response.status_code == 200:
    # Proceed with further processing
else:
    print("Failed to retrieve website content. Status code:", response.status_code)
    # Handle error or exit the script

Once you have retrieved the website content, you can proceed to compare it with the previously saved version to identify any changes.

Saving and Comparing Website Content

Once we have retrieved the website content, we need to save it for future comparison. In this section, we'll discuss how to save the content and compare it with the previously saved version. Follow the steps below 

  • Save the initial website content − After retrieving the website content, save it to a file for future comparison. Create a new file and write the content to it using the "write()" method. For example 

with open("website_content.txt", "w") as file:
    file.write(response.text)

This will save the website content in a file named "website_content.txt" in the current directory.

  • Compare with the previous content  To detect changes, we'll need to compare the current website content with the previously saved version. Read the content from the saved file and compare it with the new content. For example 

with open("website_content.txt", "r") as file:
    previous_content = file.read()

if response.text == previous_content:
    print("No changes detected.")
else:
    print("Website content has changed.")
    # Perform further actions for handling the changes

Here, we compare the new content from the response with the content read from the file. If they match, no changes are detected. Otherwise, we print a message indicating that the website content has changed.

  • Update the saved content  If changes are detected, we should update the saved content with the new version. This will ensure that the next comparison is performed against the latest content. Use the same file-writing logic as before to update the content:

with open("website_content.txt", "w") as file:
    file.write(response.text)

By overwriting the file, we save the new content as the latest version.

By following these steps, you can save the initial website content, compare it with future versions, and identify any changes. In the next section, we'll explore how to automate this process using a Python script.

Automating Website Monitoring

Manually running the script every time we want to monitor a website for changes can be tedious and impractical. In this section, we'll discuss how to automate the website monitoring process using a Python script and scheduling tools. Follow the steps below:

  • Create a Python script  Open your preferred Python editor or IDE and create a new Python script file. You can name it something like "website_monitor.py".

  • Import necessary modules  At the beginning of your script, import the required modules, including "requests" for making HTTP requests and "time" for adding delays between requests. Additionally, import any other modules you may need for sending notifications or performing other actions based on website changes.

import requests
import time
# Import other modules as needed
  • Define the website URL and monitoring interval  Set the URL of the website you want to monitor by assigning it to a variable. Also, specify the time interval at which you want to check for changes. This interval can be in seconds, minutes, or any other suitable unit.

website_url = "https://example.com"
monitoring_interval = 300  # Check every 5 minutes
  • Create a function for monitoring  Define a function that encapsulates the monitoring logic. This function will be responsible for making the HTTP request, comparing the website content, and performing any desired actions based on changes.

def monitor_website():
    while True:
        # Make the HTTP request to the website
        response = requests.get(website_url)

        # Compare the current content with the saved content
        with open("website_content.txt", "r") as file:
            previous_content = file.read()

        if response.text != previous_content:
            print("Website content has changed.")
            # Perform desired actions for handling the changes

        # Update the saved content
        with open("website_content.txt", "w") as file:
            file.write(response.text)

        # Wait for the specified interval before the next check
        time.sleep(monitoring_interval)
  • Call the monitoring function  Add a call to the monitor_website() function at the end of your script to start the monitoring process.

monitor_website()
  • Save the script  Save the Python script file in an appropriate location on your system.

  • Schedule the script  To automate the monitoring process, you can use scheduling tools like cron (on Unix-based systems) or Task Scheduler (on Windows). Set up a schedule to execute the script at the desired interval, ensuring that it runs continuously in the background.

This script will periodically check for changes in the website content and perform any specified actions accordingly.

Conclusion

Monitoring website changes is essential for staying updated with the latest content or detecting any modifications that may impact your business or personal interests. In this article, we explored how to create a Python script to monitor website changes. By leveraging the power of Python and its libraries, we can automate the process and receive timely notifications about any modifications.

We started by understanding the importance of website monitoring and the benefits it offers. Then, we delved into the steps required to build the monitoring script. We learned how to make HTTP requests, compare website content, and perform actions based on changes. Additionally, we discussed the option of automating the script using scheduling tools, ensuring continuous monitoring without manual intervention.

Updated on: 11-Aug-2023

1K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements