Find the title tags from a given html document using BeautifulSoup in Python


A common method for extracting data from web pages is known as web scraping, and the potent Python package BeautifulSoup makes it simple to do so. In this post, we'll concentrate on utilizing Python's BeautifulSoup to extract title tags from a given HTML text.

Installation and Syntax

Make sure BeautifulSoup is set up on your machine before you start writing any code, this can be done via the Python package installer, pip by typing out the command in your terminal.

pip install beautifulsoup4

We must first build a BeautifulSoup object by supplying the HTML content or file to the function of the BeautifulSoup class in order to use BeautifulSoup to extract the title tags from an HTML page. The 'title' elements in the HTML page may then be located using the find() or find_all() methods and as the final step we can get the text contained within the "title" tag by using the text property of the tag object.

Algorithm

Here's an algorithm for finding the title tags from a given HTML document using BeautifulSoup in Python −

  • Import the necessary modules - requests and BeautifulSoup

  • Send a GET request to the web page and get its content

  • Create a BeautifulSoup object from the HTML content

  • Find all the 'title' tags in the HTML document

  • Get the text inside each 'title' tag

Example 1: Finding the Title tag From a Local HTML File

Index.html

<!DOCTYPE html>
<html>
<head>
   <title>My Webpage Title</title>
</head>
<body>
   <h1>Welcome to my webpage</h1>
   <p>This is some sample text for demonstration purposes.</p>
</body>
</html>

sample_scraper.py

from bs4 import BeautifulSoup

# read the contents of the HTML file
with open('index.html', 'r') as file:
   html_content = file.read()

# create a BeautifulSoup object
soup = BeautifulSoup(html_content, 'html.parser')

# find the title tag and get its text
title_tag = soup.find('title')
title_text = title_tag.text

print(title_text)

Output

My Webpage Title

Read the contents of a local HTML file using the 'with open()' statement. We then create a BeautifulSoup object from the HTML content, and use the find() method to find the 'title' tag in the HTML document. Finally, we use the text attribute of the tag object to get the text inside the 'title' tag, which is then printed to the console.

Example 2: Finding the Title tag From a web page Using Requests

from bs4 import BeautifulSoup
import requests

# send a GET request to the web page and get its content
url = 'https://www.example.com'
response = requests.get(url)
content = response.content

# create a BeautifulSoup object
soup = BeautifulSoup(content, 'html.parser')

# find the title tag and get its text
title_tag = soup.find('title')
title_text = title_tag.text

print(title_text)

Output

Example Domain

Send a GET request to a web page and obtain its content by utilizing the requests library following which, use the find() method to locate the 'title' tag in the HTML document and create a BeautifulSoup object from the HTML content. Finally, get the text contained in the 'title' tag by utilizing the tag object's text attribute, to be printed to the console.

Example 3: Finding the Title Tags From a web Page

The same find_all() technique that was previously used to find all the tags with a specific name can be used to locate the title tags on a web page. To retrieve the first title tag, use the find() function instead because the title tag is typically the only one in an HTML page.

import requests
from bs4 import BeautifulSoup

# Make a request to the webpage
page = requests.get("https://en.wikipedia.org/wiki/Web_scraping")

# Create a BeautifulSoup object
soup = BeautifulSoup(page.content, 'html.parser')

# Find the title tag
title_tag = soup.find('title')

# Output text contained within <title></title>
print(title_tag.text)

Output

Web scraping - Wikipedia

First make a request to the Wikipedia page on web scraping using the requests library. We then create a BeautifulSoup object from the HTML content of the page using the 'html.parser' parser. Next, we use the find() method to get the first title tag in the HTML document, and we print the text inside the tag using the text attribute of the title_tagobject.

Applications

  • Web scraping − extract the page's title while gathering data from several pages for a project.

  • SEO analysis − When examining web page titles, title tags help to identify patterns or trends among titles of websites with high search engine rankings. In order for data analysts to examine news articles, blog posts, and other types of information, title tags can be used to identify their titles.

  • Because they provide a concise and easy-to-understand description of the content on a website, title tags are an essential part of on-page SEO hence by giving each page a distinctive and pertinent title, you can improve the user experience.

Conclusion

In this tutorial article, we learnt about the syntax of the find_all() and find() methods and how to use them to extract data from web pages. We also learned how to find title tags in an HTML document using BeautifulSoup in Python. Based on the few instances of discovering title tags from online pages as described with some of the applications of title tag extraction, BeautifulSoup is a strong library for web scraping and data extraction, and it's worth investigating further for anybody interested in these issues.

Updated on: 21-Aug-2023

549 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements