Beautiful Soup - Extract Title Tag



The <title> tag is used to provide a text caption to the page that appears in the browser's title bar. It is not a part of the main content of the web page. The title tag is always present inside the <head> tag.

We can extract the contents of title tag by Beautiful Soup. We parse the HTML tree and obtain the title tag object.

Example

html = '''
<html>
   <head>
      <Title>Python Libraries</title>
   </head>
   <body>
      <p Hello World</p>
   </body>
</html>
'''
from bs4 import BeautifulSoup

soup = BeautifulSoup(html, "html5lib")

title = soup.title
print (title)

Output

<title>Python Libraries</title>

In HTML, we can use title attribute with all tags. The title attribute gives additional information about an element. The information is works as a tooltip text when the mouse hovers over the element.

We can extract the text of title attribute of each tag with following code snippet −

Example

html = '''
<html>
   <body>
      <p title='parsing HTML and XML'>Beautiful Soup</p>
      <p title='HTTP library'>requests</p>
      <p title='URL handling'>urllib</p>
   </body>
</html>
'''
from bs4 import BeautifulSoup

soup = BeautifulSoup(html, "html5lib")
tags = soup.find_all()
for tag in tags:
   if tag.has_attr('title'):
      print (tag.attrs['title'])

Output

parsing HTML and XML
HTTP library
URL handling
Advertisements