Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Test whether the given Page is Found or not on the Server using Python
Testing whether a page exists on a server is crucial for web development and data validation. Python provides several efficient methods to check page availability using HTTP status codes and response analysis.
Using HTTP Status Codes
The most straightforward approach is sending an HTTP request and examining the response status code. A 200 status indicates success, while 400-500 range codes suggest errors or missing pages.
Example
import requests
def test_page_existence(url):
try:
response = requests.get(url, timeout=10)
if response.status_code == 200:
print(f"Page exists - Status: {response.status_code}")
elif response.status_code == 404:
print("Page not found - Status: 404")
else:
print(f"Page issue - Status: {response.status_code}")
except requests.RequestException as e:
print(f"Request failed: {e}")
# Test with a real URL
url = "https://httpbin.org/status/200"
test_page_existence(url)
Page exists - Status: 200
Using HEAD Requests for Efficiency
HEAD requests retrieve only response headers without downloading the full page content, making them faster and more bandwidth-efficient.
Example
import requests
def check_page_with_head(url):
try:
response = requests.head(url, timeout=10)
if response.status_code == 200:
print("Page exists (HEAD request)")
print(f"Content-Type: {response.headers.get('content-type', 'N/A')}")
else:
print(f"Page not accessible - Status: {response.status_code}")
except requests.RequestException as e:
print(f"Request failed: {e}")
# Test HEAD request
url = "https://httpbin.org/status/200"
check_page_with_head(url)
Page exists (HEAD request) Content-Type: application/json
Web Scraping Approach
For more detailed validation, you can fetch and parse HTML content to verify specific elements exist on the page.
Example
import requests
from bs4 import BeautifulSoup
def validate_page_content(url):
try:
response = requests.get(url, timeout=10)
if response.status_code == 200:
soup = BeautifulSoup(response.content, "html.parser")
title = soup.find("title")
if title:
print(f"Page exists with title: {title.get_text().strip()}")
else:
print("Page exists but no title found")
else:
print(f"Page not found - Status: {response.status_code}")
except requests.RequestException as e:
print(f"Request failed: {e}")
# Test with a page that has HTML content
url = "https://httpbin.org/html"
validate_page_content(url)
Page exists with title: Herman Melville - Moby-Dick
Comparison of Methods
| Method | Speed | Bandwidth Usage | Best For |
|---|---|---|---|
| GET Request | Slower | High | Content validation |
| HEAD Request | Fast | Low | Quick existence check |
| Web Scraping | Slowest | High | Detailed content analysis |
Enhanced Error Handling
A robust solution should handle various error scenarios including network timeouts, DNS failures, and SSL certificate issues.
Example
import requests
from requests.exceptions import Timeout, ConnectionError, RequestException
def robust_page_check(url):
try:
response = requests.head(url, timeout=5)
status_messages = {
200: "Page exists and is accessible",
301: "Page moved permanently",
302: "Page moved temporarily",
404: "Page not found",
403: "Access forbidden",
500: "Server error"
}
message = status_messages.get(response.status_code, f"Unexpected status: {response.status_code}")
print(f"Status {response.status_code}: {message}")
return response.status_code == 200
except Timeout:
print("Request timed out")
return False
except ConnectionError:
print("Connection failed")
return False
except RequestException as e:
print(f"Request error: {e}")
return False
# Test the robust function
urls = ["https://httpbin.org/status/200", "https://httpbin.org/status/404"]
for url in urls:
print(f"Testing: {url}")
robust_page_check(url)
print()
Testing: https://httpbin.org/status/200 Status 200: Page exists and is accessible Testing: https://httpbin.org/status/404 Status 404: Page not found
Conclusion
Use HEAD requests for quick page existence checks to save bandwidth. For detailed content validation, combine GET requests with HTML parsing. Always include proper error handling and timeouts for robust web applications.
