Test whether the given Page is Found or not on the Server using Python

Testing whether a page exists on a server is crucial for web development and data validation. Python provides several efficient methods to check page availability using HTTP status codes and response analysis.

Using HTTP Status Codes

The most straightforward approach is sending an HTTP request and examining the response status code. A 200 status indicates success, while 400-500 range codes suggest errors or missing pages.

Example

import requests

def test_page_existence(url):
    try:
        response = requests.get(url, timeout=10)
        if response.status_code == 200:
            print(f"Page exists - Status: {response.status_code}")
        elif response.status_code == 404:
            print("Page not found - Status: 404")
        else:
            print(f"Page issue - Status: {response.status_code}")
    except requests.RequestException as e:
        print(f"Request failed: {e}")

# Test with a real URL
url = "https://httpbin.org/status/200"
test_page_existence(url)
Page exists - Status: 200

Using HEAD Requests for Efficiency

HEAD requests retrieve only response headers without downloading the full page content, making them faster and more bandwidth-efficient.

Example

import requests

def check_page_with_head(url):
    try:
        response = requests.head(url, timeout=10)
        if response.status_code == 200:
            print("Page exists (HEAD request)")
            print(f"Content-Type: {response.headers.get('content-type', 'N/A')}")
        else:
            print(f"Page not accessible - Status: {response.status_code}")
    except requests.RequestException as e:
        print(f"Request failed: {e}")

# Test HEAD request
url = "https://httpbin.org/status/200"
check_page_with_head(url)
Page exists (HEAD request)
Content-Type: application/json

Web Scraping Approach

For more detailed validation, you can fetch and parse HTML content to verify specific elements exist on the page.

Example

import requests
from bs4 import BeautifulSoup

def validate_page_content(url):
    try:
        response = requests.get(url, timeout=10)
        if response.status_code == 200:
            soup = BeautifulSoup(response.content, "html.parser")
            title = soup.find("title")
            if title:
                print(f"Page exists with title: {title.get_text().strip()}")
            else:
                print("Page exists but no title found")
        else:
            print(f"Page not found - Status: {response.status_code}")
    except requests.RequestException as e:
        print(f"Request failed: {e}")

# Test with a page that has HTML content
url = "https://httpbin.org/html"
validate_page_content(url)
Page exists with title: Herman Melville - Moby-Dick

Comparison of Methods

Method Speed Bandwidth Usage Best For
GET Request Slower High Content validation
HEAD Request Fast Low Quick existence check
Web Scraping Slowest High Detailed content analysis

Enhanced Error Handling

A robust solution should handle various error scenarios including network timeouts, DNS failures, and SSL certificate issues.

Example

import requests
from requests.exceptions import Timeout, ConnectionError, RequestException

def robust_page_check(url):
    try:
        response = requests.head(url, timeout=5)
        status_messages = {
            200: "Page exists and is accessible",
            301: "Page moved permanently",
            302: "Page moved temporarily", 
            404: "Page not found",
            403: "Access forbidden",
            500: "Server error"
        }
        
        message = status_messages.get(response.status_code, f"Unexpected status: {response.status_code}")
        print(f"Status {response.status_code}: {message}")
        
        return response.status_code == 200
        
    except Timeout:
        print("Request timed out")
        return False
    except ConnectionError:
        print("Connection failed")  
        return False
    except RequestException as e:
        print(f"Request error: {e}")
        return False

# Test the robust function
urls = ["https://httpbin.org/status/200", "https://httpbin.org/status/404"]
for url in urls:
    print(f"Testing: {url}")
    robust_page_check(url)
    print()
Testing: https://httpbin.org/status/200
Status 200: Page exists and is accessible

Testing: https://httpbin.org/status/404
Status 404: Page not found

Conclusion

Use HEAD requests for quick page existence checks to save bandwidth. For detailed content validation, combine GET requests with HTML parsing. Always include proper error handling and timeouts for robust web applications.

Updated on: 2026-03-27T09:51:09+05:30

443 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements