- Trending Categories
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
Physics
Chemistry
Biology
Mathematics
English
Economics
Psychology
Social Studies
Fashion Studies
Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Find the text of the given tag using BeautifulSoup
BeautifulSoup is a powerful tool that makes it easy to extract information from HTML and XML documents primarily developed in Python for the purpose of web scraping and web data extraction. One of the most useful features of BeautifulSoup is the ability to find specific tags within a document. In this blog, we will explore how to use BeautifulSoup to find the text of a given tag along with a few examples.
Installation and Syntax
Installing BeautifulSoup is necessary before using it so use the Python package manager and run the following command right inside your terminal.
pip install beautifulsoup4
Once we have installed BeautifulSoup, we can import it in our Python code using
from bs4 import BeautifulSoup
The syntax for finding the text of a tag using BeautifulSoup is as follows −
soup.find('tag_name').text
Algorithm
Pass the HTML file or content to the BeautifulSoup class's function to create a BeautifulSoup object.
Utilize the find() function to track down the tag or labels that you're searching for. Get the text that is incorporated inside a tag by utilizing the text property of the label object.
This will produce a string that only includes the text in the tag if there is no HTML or XML markup.
We can use the text property to obtain the text contained within each tag by repeatedly traversing the list of tags generated by find_all() using a loop if we are interested in doing so.
Example 1
from bs4 import BeautifulSoup html = '<html><body><h1>Hello, World!</h1></body></html>' soup = BeautifulSoup(html, 'html.parser') heading = soup.find('h1') print(heading.text)
Output
Hello, World!
Create an HTML string and pass it to the BeautifulSoup constructor along with the parser to use. Use the find() method to find the h1 tag and store it in the heading variable. Finally, use the text attribute of the heading object to get the text inside the tag.
Example 2
from bs4 import BeautifulSoup html = '<html><body><p>TutorialsPoint Web Scraping Example Text</p></body></html>' soup = BeautifulSoup(html, 'html.parser') paragraph = soup.find('p') print(paragraph.text)
Output
TutorialsPoint Web Scraping Example Text
Start with a string of HTML that contains a paragraph tag and some text and use the find() method to locate the paragraph tag and store it in the paragraph variable. The text in the tag will be obtained by utilizing the text attribute of the paragraph object.
Let's look at a bigger example to see how we can use BeautifulSoup to find the text of multiple tags −
import requests from bs4 import BeautifulSoup url = 'https://www.pythonforbeginners.com/' response = requests.get(url) soup = BeautifulSoup(response.content, 'html.parser') headings = soup.find_all('h3') for heading in headings: print(heading.text)
Output
Popular Python Tutorials Categories Loops Regular Expressions Python Games Basics Functions Code Examples Strings Dictionaries Python on the Web Lists Modules Python Comments Latest Content Convert INI Files to JSON Format in Python Convert XML to INI Format in Python Pandas Insert Row into a DataFrame Convert INI to XML Format in Python
A GET request is being sent to the website indicated in the url variable in this case using the requests library and similarly, to parse the response's HTML data, the BeautifulSoup parser technique is used. Then, using find all(), all of the h2 tags on the page are located and stored in the headings variable. Finally, use a loop to go through each heading and output its content using the text property.
Applications
Online scraping, data extraction, and data analysis are among BeautifulSoup's application areas. It could be utilized to scrape news stories, virtual entertainment information, and various different sources to get data from sites so a typical use case for this is Web computerization and testing. It is also a useful tool for developers because it supports a variety of parsers, including HTML and XML. The user-friendly syntax and extensive documentation of web scraping and data extraction make it simple for novices to get started.
Conclusion
Web scraping and data extraction are made simple by BeautifulSoup, a powerful application. Because of its straightforward syntax, an HTML or XML document can quickly access the tag's text. BeautifulSoup is an extraordinary device to have in your tool stash whether you're keen on scratching information from sites or breaking it down. In addition, it makes it simple to move through the HTML tree structure and extract specific data from multiple pages simultaneously. BeautifulSoup is a must-have for any data scientist or web developer due to its user-friendly interface and extensive documentation.