Beautiful Soup - Functions Reference

Beautiful Soup Useful Resources

Beautiful Soup - decode() Method



Method Description

The decode() method in Beautiful Soup returns a string or Unicode representation of the parse tree as an HTML or XML document. The method decodes the bytes using the codec registered for encoding. Its function is opposite to that of encode() method. You call encode() to get a bytestring, and decode() to get Unicode. Let us study decode() method with some examples.

Syntax

decode(pretty_print, encoding, formatter, errors)

Parameters

  • pretty_print − If this is True, indentation will be used to make the document more readable.

  • encoding − The encoding of the final document. If this is None, the document will be a Unicode string.

  • formatter − A Formatter object, or a string naming one of the standard formatters.

  • errors − The error handling scheme to use for the handling of decoding errors. Values are 'strict', 'ignore' and 'replace'.

Return Value

The decode() method returns a Unicode String.

Example - Decoding a UTF-8 encoded String

from bs4 import BeautifulSoup

soup = BeautifulSoup("Hello “World!”", 'html.parser')
enc = soup.encode('utf-8')
print (enc)
dec = enc.decode()
print (dec)

Output

b'Hello \xe2\x80\x9cWorld!\xe2\x80\x9d'
Hello "World!"

Example - Decoding a latin-1 encoded String

markup = '''
<html>
   <head>
      <meta content="text/html; charset=ISO-Latin-1" http-equiv="Content-type" />
   </head>
   <body>
      <p>Sacr`e bleu!</p>
   </body>
</html>
'''
from bs4 import BeautifulSoup

soup = BeautifulSoup(markup, 'lxml')
enc = soup.p.encode("latin-1")

dec = enc.decode("latin-1")
print (dec)

Output

<p>Sacr`e bleu!</p>
Advertisements