How do you validate a URL with a regular expression in Python?

Validating URLs in Python can be approached in several ways depending on your specific needs. You can check URL format with regular expressions, verify structure with urlparse, or test actual connectivity.

Using Regular Expression for URL Validation

A comprehensive regex pattern can validate most URL formats ?

import re

def validate_url_regex(url):
    pattern = r'^https?://(?:[-\w.])+(?:[:\d]+)?(?:/(?:[\w/_.])*(?:\?(?:[\w&=%.])*)?(?:#(?:\w*))?)?$'
    return re.match(pattern, url) is not None

# Test URLs
urls = [
    "https://www.example.com",
    "http://subdomain.example.com:8080/path?query=value",
    "invalid-url",
    "https://example.com/page#section"
]

for url in urls:
    if validate_url_regex(url):
        print(f"'{url}' is valid")
    else:
        print(f"'{url}' is invalid")
'https://www.example.com' is valid
'http://subdomain.example.com:8080/path?query=value' is valid
'invalid-url' is invalid
'https://example.com/page#section' is valid

Using urllib.parse for Structure Validation

The urlparse module can verify URL structure without complex regex ?

from urllib.parse import urlparse

def validate_url_parse(url):
    try:
        result = urlparse(url)
        return all([result.scheme, result.netloc])
    except:
        return False

# Test URLs
urls = [
    "https://www.example.com",
    "http://example.com/path",
    "invalid-url",
    "ftp://files.example.com"
]

for url in urls:
    if validate_url_parse(url):
        print(f"'{url}' has valid structure")
    else:
        print(f"'{url}' has invalid structure")
'https://www.example.com' has valid structure
'http://example.com/path' has valid structure
'invalid-url' has invalid structure
'ftp://files.example.com' has valid structure

Testing URL Accessibility

To verify if a URL is actually reachable, make an HTTP request ?

import requests
from urllib.parse import urlparse

def validate_url_accessible(url):
    try:
        # First check structure
        parsed = urlparse(url)
        if not all([parsed.scheme, parsed.netloc]):
            return False
        
        # Then test connectivity
        response = requests.head(url, timeout=5)
        return response.status_code < 400
    except:
        return False

# Example usage
test_urls = [
    "https://www.google.com",
    "https://nonexistent-site-12345.com",
    "invalid-url"
]

for url in test_urls:
    if validate_url_accessible(url):
        print(f"'{url}' is accessible")
    else:
        print(f"'{url}' is not accessible")

Comparison

Method Checks Format Checks Accessibility Performance
Regular Expression Yes No Fast
urlparse Basic No Fast
HTTP Request Yes Yes Slow

Conclusion

Use regular expressions for strict format validation, urlparse for basic structure checking, and HTTP requests when you need to verify actual URL accessibility. Choose the method based on your performance and validation requirements.

Updated on: 2026-03-24T19:36:32+05:30

707 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements