Beautiful Soup - Kinds of objects


When we passed a html document or string to a beautifulsoup constructor, beautifulsoup basically converts a complex html page into different python objects. Below we are going to discuss four major kinds of objects:

  • Tag

  • NavigableString

  • BeautifulSoup

  • Comments

Tag Objects

A HTML tag is used to define various types of content. A tag object in BeautifulSoup corresponds to an HTML or XML tag in the actual page or document.

>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup('<b class="boldest">TutorialsPoint</b>')
>>> tag = soup.html
>>> type(tag)
<class 'bs4.element.Tag'>

Tags contain lot of attributes and methods and two important features of a tag are its name and attributes.

Name (

Every tag contains a name and can be accessed through ‘.name’ as suffix. will return the type of tag it is.


However, if we change the tag name, same will be reflected in the HTML markup generated by the BeautifulSoup.

>>> = "Strong"
>>> tag
<Strong><body><b class="boldest">TutorialsPoint</b></body></Strong>

Attributes (tag.attrs)

A tag object can have any number of attributes. The tag <b class=”boldest”> has an attribute ‘class’ whose value is “boldest”. Anything that is NOT tag, is basically an attribute and must contain a value. You can access the attributes either through accessing the keys (like accessing “class” in above example) or directly accessing through “.attrs”

>>> tutorialsP = BeautifulSoup("<div class='tutorialsP'></div>",'lxml')
>>> tag2 = tutorialsP.div
>>> tag2['class']

We can do all kind of modifications to our tag’s attributes (add/remove/modify).

>>> tag2['class'] = 'Online-Learning'
>>> tag2['style'] = '2007'
>>> tag2
<div class="Online-Learning" style="2007"></div>
>>> del tag2['style']
>>> tag2
<div class="Online-Learning"></div>
>>> del tag['class']
>>> tag
<b SecondAttribute="2">TutorialsPoint</b>
>>> del tag['SecondAttribute']
>>> tag
>>> tag2['class']
>>> tag2['style']
KeyError: 'style'

Multi-valued attributes

Some of the HTML5 attributes can have multiple values. Most commonly used is the class-attribute which can have multiple CSS-values. Others include ‘rel’, ‘rev’, ‘headers’, ‘accesskey’ and ‘accept-charset’. The multi-valued attributes in beautiful soup are shown as list.

>>> from bs4 import BeautifulSoup
>>> css_soup = BeautifulSoup('<p class="body"></p>')
>>> css_soup.p['class']
>>> css_soup = BeautifulSoup('<p class="body bold"></p>')
>>> css_soup.p['class']
['body', 'bold']

However, if any attribute contains more than one value but it is not multi-valued attributes by any-version of HTML standard, beautiful soup will leave the attribute alone −

>>> id_soup = BeautifulSoup('<p id="body bold"></p>')
>>> id_soup.p['id']
'body bold'
>>> type(id_soup.p['id'])
<class 'str'>

You can consolidate multiple attribute values if you turn a tag to a string.

>>> rel_soup = BeautifulSoup("<p> tutorialspoint Main <a rel='Index'> Page</a></p>")
>>> rel_soup.a['rel']
>>> rel_soup.a['rel'] = ['Index', ' Online Library, Its all Free']
>>> print(rel_soup.p)
<p> tutorialspoint Main <a rel="Index Online Library, Its all Free"> Page</a></p>

By using ‘get_attribute_list’, you get a value that is always a list, string, irrespective of whether it is a multi-valued or not.


However, if you parse the document as ‘xml’, there are no multi-valued attributes −

>>> xml_soup = BeautifulSoup('<p class="body bold"></p>', 'xml')
>>> xml_soup.p['class']
'body bold'


The navigablestring object is used to represent the contents of a tag. To access the contents, use “.string” with tag.

>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup("<h2 id='message'>Hello, Tutorialspoint!</h2>")
>>> soup.string
'Hello, Tutorialspoint!'
>>> type(soup.string)

You can replace the string with another string but you can’t edit the existing string.

>>> soup = BeautifulSoup("<h2 id='message'>Hello, Tutorialspoint!</h2>")
>>> soup.string.replace_with("Online Learning!")
'Hello, Tutorialspoint!'
>>> soup.string
'Online Learning!'
>>> soup
<html><body><h2 id="message">Online Learning!</h2></body></html>


BeautifulSoup is the object created when we try to scrape a web resource. So, it is the complete document which we are trying to scrape. Most of the time, it is treated tag object.

>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup("<h2 id='message'>Hello, Tutorialspoint!</h2>")
>>> type(soup)
<class 'bs4.BeautifulSoup'>


The comment object illustrates the comment part of the web document. It is just a special type of NavigableString.

>>> soup = BeautifulSoup('<p><!-- Everything inside it is COMMENTS --></p>')
>>> comment = soup.p.string
>>> type(comment)
<class 'bs4.element.Comment'>
>>> type(comment)
<class 'bs4.element.Comment'>
>>> print(soup.p.prettify())
<!-- Everything inside it is COMMENTS -->

NavigableString Objects

The navigablestring objects are used to represent text within tags, rather than the tags themselves.