urllib.parse — Parse URLs into components in Python

Python Server Side Programming Programming

This module provides a standard interface to break Uniform Resource Locator (URL) strings in components or to combine the components back into a URL string. It also has functions to convert a "relative URL" to an absolute URL given a "base URL."

This module supports the following URL schemes -

file
ftp
gopher
hdl
http
https
imap
mailto
mms
news
nntp
prospero
rsync
rtsp
rtspu
sftp
shttp
sip
sips
snews
svn
svn+ssh
telnet
wais
ws
wss

urlparse()

This function parses a URL into six components, returning a 6-tuple. This corresponds to the general structure of a URL. Each tuple item is a string. The components are not broken up in smaller parts (for example, the network location is a single string), and % escapes are not expanded. The return value is an instance of a subclass of tuple made up of following attributes:

Attribute	Index	Value	Value if not present
scheme	0	URL scheme specifier	scheme parameter
netloc	1	Network location part	scheme parameter
path	2	Hierarchical path	empty string
params	3	Parameters for last path element	empty string
query	4	Query component	empty string
fragment	5	Fragment identifier	empty string
username		User name	None
password		Password	None
hostname		Host name (lower case)	None
port		Port number as integer, if present	None

Example

>>> from urllib.parse import urlparse
>>> url = 'https://mail.google.com/mail/u/0/?tab = rm#inbox'
>>> t = urlparse(url)
ParseResult(scheme = 'https', netloc = 'mail.google.com', path = '/mail/u/0/', params = '', query = 'tab = rm', fragment = 'inbox')

urlunparse(parts)

This function constructs a URL from a tuple as returned by urlparse(). The parts argument can be any six-item iterable.

>>> from urllib.parse import urlunparse
>>> urlunparse(t)
'https://mail.google.com/mail/u/0/?tab = rm#inbox'

urlsplit(urlstring, scheme = '', allow_fragments = True):

This is similar to urlparse(), but does not split the params from the URL. This function returns a 5-tuple: (addressing scheme, network location, path, query, fragment identifier).

>>> from urllib.parse import urlsplit
>>> urlsplit(url)
SplitResult(scheme = 'https', netloc = 'mail.google.com', path = '/mail/u/0/', query = 'tab = rm', fragment = 'inbox')

urlunsplit(parts)

This function combines the elements of a tuple as returned by urlsplit() into a complete URL as a string.

The URL quoting functions focus on taking program data and making it safe for use as URL components by quoting special characters and appropriately encoding non-ASCII text.

quote()

This function replaces special characters in string using the %xx escape. Letters, digits, and the characters '_.-~' are never quoted.

>>> from urllib.parse import quote
>>> q = quote(url)
'https%3A//mail.google.com/mail/u/0/%3Ftab%3Drm%23inbox'
quote_plus():

Like quote(), but also replace spaces by plus signs, as required for quoting HTML form values when building up a query string to go into a URL.

unquote()

This function replaces %xx escapes by their single-character equivalent.

>>> from urllib.parse import unquote
>>> unquote(q)
'https://mail.google.com/mail/u/0/?tab = rm#inbox'

urlencode()

This function converts a mapping object or a sequence of two-element tuples,to a percent-encoded ASCII text string. The resulting string is a series of key = value pairs separated by '&' characters.

>>> from urllib.parse import urlencode
>>> qry = {"name":"Rajeev", "salary":20000}
>>> urlencode(qry)
'name = Rajeev&salary = 20000'

Nitya Raut

Updated on: 2019-07-30T22:30:25+05:30

8K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started