Beautiful Soup - Convert Object to String



The Beautiful Soup API has three main types of objects. The soup object, the Tag object, and the NavigableString object. Let us find out how we can convert each of these object to string. In Python, string is a str object.

Assuming that we have a following HTML document

html = '''
<p>Hello <b>World</b></p>
'''

Let us put this string as argument for BeautifulSoup constructor. The soup object is then typecast to string object with Python's builtin str() function.

The parsed tree of this HTML string will be constructed dpending upon which parser you use. The built-in html parser doesn't add the <html> and <body> tags.

Example

from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
print (str(soup))

Output

<p>Hello <b>World</b></p>

On the other hand, the html5lib parser constructs the tree after inserting the formal tags such as <html> and <body>

from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html5lib')
print (str(soup))

Output

<html><head></head><body><p>Hello <b>World</b></p>
</body></html>

The Tag object has a string property that returns a NavigableString object.

tag = soup.find('b')
obj = (tag.string)
print (type(obj),obj)

Output

string <class 'bs4.element.NavigableString'> World

There is also a Text property defined for Tag object. It returns the text contained in the tag, stripping off all the inner tags and attributes.

If the HTML string is −

html = '''
   <p>Hello <div id='id'>World</div></p>
'''

We try to obtain the text property of <p> tag

tag = soup.find('p')
obj = (tag.text)
print ( type(obj), obj)

Output

<class 'str'> Hello World

You can also use the get_text() method which returns a string representing the text inside the tag. The function is actually a wrapper arounf the text property as it also gets rid of inner tags and attributes, and returns a string

obj = tag.get_text()
print (type(obj),obj)

Output

<class 'str'> Hello World
Advertisements