Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
urllib.parse — Parse URLs into components in Python
The urllib.parse module provides a standard interface to break Uniform Resource Locator (URL) strings into components or to combine the components back into a URL string. It also has functions to convert a "relative URL" to an absolute URL given a "base URL."
This module supports the following URL schemes:
- file
- ftp
- gopher
- hdl
- http
- https
- imap
- mailto
- mms
- news
- nntp
- prospero
- rsync
- rtsp
- rtspu
- sftp
- shttp
- sip
- sips
- snews
- svn
- svn+ssh
- telnet
- wais
- ws
- wss
urlparse() Function
The urlparse() function parses a URL into six components, returning a 6-tuple. Each tuple item is a string, and % escapes are not expanded. The return value is an instance of a subclass of tuple with named attributes:
| Attribute | Index | Value | Value if not present |
|---|---|---|---|
| scheme | 0 | URL scheme specifier | scheme parameter |
| netloc | 1 | Network location part | empty string |
| path | 2 | Hierarchical path | empty string |
| params | 3 | Parameters for last path element | empty string |
| query | 4 | Query component | empty string |
| fragment | 5 | Fragment identifier | empty string |
| username | ? | User name | None |
| password | ? | Password | None |
| hostname | ? | Host name (lower case) | None |
| port | ? | Port number as integer, if present | None |
Example
from urllib.parse import urlparse
url = 'https://mail.google.com/mail/u/0/?tab=rm#inbox'
result = urlparse(url)
print(result)
print(f"Scheme: {result.scheme}")
print(f"Network Location: {result.netloc}")
print(f"Path: {result.path}")
print(f"Query: {result.query}")
print(f"Fragment: {result.fragment}")
ParseResult(scheme='https', netloc='mail.google.com', path='/mail/u/0/', params='', query='tab=rm', fragment='inbox') Scheme: https Network Location: mail.google.com Path: /mail/u/0/ Query: tab=rm Fragment: inbox
urlunparse() Function
The urlunparse() function constructs a URL from a tuple as returned by urlparse(). The parts argument can be any six-item iterable ?
from urllib.parse import urlparse, urlunparse url = 'https://mail.google.com/mail/u/0/?tab=rm#inbox' parsed = urlparse(url) reconstructed = urlunparse(parsed) print(reconstructed)
https://mail.google.com/mail/u/0/?tab=rm#inbox
urlsplit() Function
The urlsplit() function is similar to urlparse(), but does not split the params from the URL. This function returns a 5-tuple: (scheme, netloc, path, query, fragment) ?
from urllib.parse import urlsplit url = 'https://mail.google.com/mail/u/0/?tab=rm#inbox' result = urlsplit(url) print(result)
SplitResult(scheme='https', netloc='mail.google.com', path='/mail/u/0/', query='tab=rm', fragment='inbox')
URL Quoting Functions
The URL quoting functions focus on taking program data and making it safe for use as URL components by quoting special characters and appropriately encoding non-ASCII text.
quote() Function
The quote() function replaces special characters in string using the %xx escape. Letters, digits, and the characters '_.-~' are never quoted ?
from urllib.parse import quote url = 'https://mail.google.com/mail/u/0/?tab=rm#inbox' quoted = quote(url) print(quoted)
https%3A//mail.google.com/mail/u/0/%3Ftab%3Drm%23inbox
unquote() Function
The unquote() function replaces %xx escapes by their single-character equivalent ?
from urllib.parse import quote, unquote
url = 'https://mail.google.com/mail/u/0/?tab=rm#inbox'
quoted = quote(url)
unquoted = unquote(quoted)
print(f"Original: {url}")
print(f"Quoted: {quoted}")
print(f"Unquoted: {unquoted}")
Original: https://mail.google.com/mail/u/0/?tab=rm#inbox Quoted: https%3A//mail.google.com/mail/u/0/%3Ftab%3Drm%23inbox Unquoted: https://mail.google.com/mail/u/0/?tab=rm#inbox
urlencode() Function
The urlencode() function converts a mapping object or a sequence of two-element tuples to a percent-encoded ASCII text string. The resulting string is a series of key=value pairs separated by '&' characters ?
from urllib.parse import urlencode
query_params = {"name": "Rajeev", "salary": 20000, "dept": "IT"}
encoded = urlencode(query_params)
print(encoded)
name=Rajeev&salary=20000&dept=IT
Conclusion
The urllib.parse module provides essential tools for URL manipulation in Python. Use urlparse() for detailed URL analysis and quote()/unquote() for safe URL encoding.
