Article Categories

Selected Reading

How do you validate a URL with a regular expression in Python?

Python Server Side Programming Programming

Validating URLs in Python can be approached in several ways depending on your specific needs. You can check URL format with regular expressions, verify structure with urlparse, or test actual connectivity.

Using Regular Expression for URL Validation

A comprehensive regex pattern can validate most URL formats ?

import re

def validate_url_regex(url):
    pattern = r'^https?://(?:[-\w.])+(?:[:\d]+)?(?:/(?:[\w/_.])*(?:\?(?:[\w&=%.])*)?(?:#(?:\w*))?)?$'
    return re.match(pattern, url) is not None

# Test URLs
urls = [
    "https://www.example.com",
    "http://subdomain.example.com:8080/path?query=value",
    "invalid-url",
    "https://example.com/page#section"
]

for url in urls:
    if validate_url_regex(url):
        print(f"'{url}' is valid")
    else:
        print(f"'{url}' is invalid")

'https://www.example.com' is valid
'http://subdomain.example.com:8080/path?query=value' is valid
'invalid-url' is invalid
'https://example.com/page#section' is valid

Using urllib.parse for Structure Validation

The urlparse module can verify URL structure without complex regex ?

from urllib.parse import urlparse

def validate_url_parse(url):
    try:
        result = urlparse(url)
        return all([result.scheme, result.netloc])
    except:
        return False

# Test URLs
urls = [
    "https://www.example.com",
    "http://example.com/path",
    "invalid-url",
    "ftp://files.example.com"
]

for url in urls:
    if validate_url_parse(url):
        print(f"'{url}' has valid structure")
    else:
        print(f"'{url}' has invalid structure")

'https://www.example.com' has valid structure
'http://example.com/path' has valid structure
'invalid-url' has invalid structure
'ftp://files.example.com' has valid structure

Testing URL Accessibility

To verify if a URL is actually reachable, make an HTTP request ?

import requests
from urllib.parse import urlparse

def validate_url_accessible(url):
    try:
        # First check structure
        parsed = urlparse(url)
        if not all([parsed.scheme, parsed.netloc]):
            return False
        
        # Then test connectivity
        response = requests.head(url, timeout=5)
        return response.status_code < 400
    except:
        return False

# Example usage
test_urls = [
    "https://www.google.com",
    "https://nonexistent-site-12345.com",
    "invalid-url"
]

for url in test_urls:
    if validate_url_accessible(url):
        print(f"'{url}' is accessible")
    else:
        print(f"'{url}' is not accessible")

Comparison

Method	Checks Format	Checks Accessibility	Performance
Regular Expression	Yes	No	Fast
urlparse	Basic	No	Fast
HTTP Request	Yes	Yes	Slow

Conclusion

Use regular expressions for strict format validation, urlparse for basic structure checking, and HTTP requests when you need to verify actual URL accessibility. Choose the method based on your performance and validation requirements.

Rajendra Dharmkar

Updated on: 2026-03-24T19:36:32+05:30

768 Views

Previous Next