Find the text of the given tag using BeautifulSoup

Python BeautifulSoup Server Side Programming Programming

BeautifulSoup is a powerful tool that makes it easy to extract information from HTML and XML documents primarily developed in Python for the purpose of web scraping and web data extraction. One of the most useful features of BeautifulSoup is the ability to find specific tags within a document. In this blog, we will explore how to use BeautifulSoup to find the text of a given tag along with a few examples.

Installation and Syntax

Installing BeautifulSoup is necessary before using it so use the Python package manager and run the following command right inside your terminal.

pip install beautifulsoup4

Once we have installed BeautifulSoup, we can import it in our Python code using

from bs4 import BeautifulSoup

The syntax for finding the text of a tag using BeautifulSoup is as follows −

soup.find('tag_name').text

Algorithm

Pass the HTML file or content to the BeautifulSoup class's function to create a BeautifulSoup object.
Utilize the find() function to track down the tag or labels that you're searching for. Get the text that is incorporated inside a tag by utilizing the text property of the label object.
This will produce a string that only includes the text in the tag if there is no HTML or XML markup.
We can use the text property to obtain the text contained within each tag by repeatedly traversing the list of tags generated by find_all() using a loop if we are interested in doing so.

Example 1

from bs4 import BeautifulSoup
html = '<html><body><h1>Hello, World!</h1></body></html>'
soup = BeautifulSoup(html, 'html.parser')
heading = soup.find('h1')
print(heading.text)

Output

Hello, World!

Create an HTML string and pass it to the BeautifulSoup constructor along with the parser to use. Use the find() method to find the h1 tag and store it in the heading variable. Finally, use the text attribute of the heading object to get the text inside the tag.

Example 2

from bs4 import BeautifulSoup
html = '<html><body><p>TutorialsPoint Web Scraping Example Text</p></body></html>'
soup = BeautifulSoup(html, 'html.parser')
paragraph = soup.find('p')
print(paragraph.text)

Output

TutorialsPoint Web Scraping Example Text

Start with a string of HTML that contains a paragraph tag and some text and use the find() method to locate the paragraph tag and store it in the paragraph variable. The text in the tag will be obtained by utilizing the text attribute of the paragraph object.

Let's look at a bigger example to see how we can use BeautifulSoup to find the text of multiple tags −

import requests
from bs4 import BeautifulSoup
url = 'https://www.pythonforbeginners.com/'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
headings = soup.find_all('h3')
for heading in headings:
   print(heading.text)

Output

Popular Python Tutorials
Categories
Loops
Regular Expressions
Python Games
Basics
Functions
Code Examples
Strings
Dictionaries
Python on the Web
Lists
Modules
Python Comments
Latest Content
Convert INI Files to JSON Format in Python
Convert XML to INI Format in Python
Pandas Insert Row into a DataFrame
Convert INI to XML Format in Python

A GET request is being sent to the website indicated in the url variable in this case using the requests library and similarly, to parse the response's HTML data, the BeautifulSoup parser technique is used. Then, using find all(), all of the h2 tags on the page are located and stored in the headings variable. Finally, use a loop to go through each heading and output its content using the text property.

Applications

Online scraping, data extraction, and data analysis are among BeautifulSoup's application areas. It could be utilized to scrape news stories, virtual entertainment information, and various different sources to get data from sites so a typical use case for this is Web computerization and testing. It is also a useful tool for developers because it supports a variety of parsers, including HTML and XML. The user-friendly syntax and extensive documentation of web scraping and data extraction make it simple for novices to get started.

Conclusion

Web scraping and data extraction are made simple by BeautifulSoup, a powerful application. Because of its straightforward syntax, an HTML or XML document can quickly access the tag's text. BeautifulSoup is an extraordinary device to have in your tool stash whether you're keen on scratching information from sites or breaking it down. In addition, it makes it simple to move through the HTML tree structure and extract specific data from multiple pages simultaneously. BeautifulSoup is a must-have for any data scientist or web developer due to its user-friendly interface and extensive documentation.

Atharva Shah

Updated on: 21-Aug-2023

2K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started