urllib.parse — Parse URLs into components in Python

PythonServer Side ProgrammingProgramming

This module provides a standard interface to break Uniform Resource Locator (URL) strings in components or to combine the components back into a URL string. It also has functions to convert a "relative URL" to an absolute URL given a "base URL."

This module supports the following URL schemes -

  • file
  • ftp
  • gopher
  • hdl
  • http
  • https
  • imap
  • mailto
  • mms
  • news
  • nntp
  • prospero
  • rsync
  • rtsp
  • rtspu
  • sftp
  • shttp
  • sip
  • sips
  • snews
  • svn
  • svn+ssh
  • telnet
  • wais
  • ws
  • wss

urlparse()

This function parses a URL into six components, returning a 6-tuple. This corresponds to the general structure of a URL. Each tuple item is a string. The components are not broken up in smaller parts (for example, the network location is a single string), and % escapes are not expanded. The return value is an instance of a subclass of tuple made up of following attributes:

AttributeIndexValueValue if not present
scheme0URL scheme specifierscheme parameter
netloc1Network location partscheme parameter
path2Hierarchical pathempty string
params3Parameters for last path elementempty string
query4Query componentempty string
fragment5Fragment identifierempty string
username
User nameNone
password
PasswordNone
hostname
Host name (lower case)None
port
Port number as integer, if presentNone

Example

>>> from urllib.parse import urlparse
>>> url = 'https://mail.google.com/mail/u/0/?tab = rm#inbox'
>>> t = urlparse(url)
ParseResult(scheme = 'https', netloc = 'mail.google.com', path = '/mail/u/0/', params = '', query = 'tab = rm', fragment = 'inbox')

urlunparse(parts)

This function constructs a URL from a tuple as returned by urlparse(). The parts argument can be any six-item iterable.

>>> from urllib.parse import urlunparse
>>> urlunparse(t)
'https://mail.google.com/mail/u/0/?tab = rm#inbox'

urlsplit(urlstring, scheme = '', allow_fragments = True):

This is similar to urlparse(), but does not split the params from the URL. This function returns a 5-tuple: (addressing scheme, network location, path, query, fragment identifier).

>>> from urllib.parse import urlsplit
>>> urlsplit(url)
SplitResult(scheme = 'https', netloc = 'mail.google.com', path = '/mail/u/0/', query = 'tab = rm', fragment = 'inbox')

urlunsplit(parts)

This function combines the elements of a tuple as returned by urlsplit() into a complete URL as a string.

The URL quoting functions focus on taking program data and making it safe for use as URL components by quoting special characters and appropriately encoding non-ASCII text.

quote()

This function replaces special characters in string using the %xx escape. Letters, digits, and the characters '_.-~' are never quoted.

>>> from urllib.parse import quote
>>> q = quote(url)
'https%3A//mail.google.com/mail/u/0/%3Ftab%3Drm%23inbox'
quote_plus():

Like quote(), but also replace spaces by plus signs, as required for quoting HTML form values when building up a query string to go into a URL.

unquote()

This function replaces %xx escapes by their single-character equivalent.

>>> from urllib.parse import unquote
>>> unquote(q)
'https://mail.google.com/mail/u/0/?tab = rm#inbox'

urlencode()

This function converts a mapping object or a sequence of two-element tuples,to a percent-encoded ASCII text string. The resulting string is a series of key = value pairs separated by '&' characters.

>>> from urllib.parse import urlencode
>>> qry = {"name":"Rajeev", "salary":20000}
>>> urlencode(qry)
'name = Rajeev&salary = 20000'
raja
Published on 16-Apr-2019 12:10:34
Advertisements