Find the tag with a given attribute value in an HTML document using BeautifulSoup

Extracting data from HTML pages is a typical activity during web scraping. Many tags and attributes found in HTML pages aid in locating and extracting relevant data. BeautifulSoup is a well-known Python library that can be used to parse HTML documents and extract useful information. In this tutorial, we'll focus on using BeautifulSoup to locate a tag that has a specific attribute value.

Installation and Setup

To get started, we must install BeautifulSoup. Pip, Python's package installer, can be used for this. Enter the following command in a command prompt or terminal

pip install beautifulsoup4

After installation, we can import BeautifulSoup in our Python code using the following statement

from bs4 import BeautifulSoup

Syntax

The syntax to find a tag with a given attribute value using BeautifulSoup is as follows

soup.find(tag_name, attrs={attribute_name: attribute_value})

Here, soup refers to the BeautifulSoup object that contains the parsed HTML content, tag_name is the HTML tag we're looking for, attribute_name is the attribute we want to match, and attribute_value is the specific value we're searching for.

Alternative syntax options include

# Direct attribute syntax
soup.find(tag_name, class_='value')  # Note the underscore in class_
soup.find(tag_name, id='value')

# Multiple attributes
soup.find(tag_name, attrs={'class': 'value', 'id': 'another_value'})

# Find all matching tags
soup.find_all(tag_name, attrs={attribute_name: attribute_value})

Algorithm

  • Parse the HTML document using BeautifulSoup

  • Use the find() method to locate the first tag with the given attribute value

  • Extract the required data from the found tag

  • Use find_all() if multiple matching tags are needed

Finding Tag by Class Attribute

Example

To find a paragraph tag with class "important", we can use the following code

from bs4 import BeautifulSoup

html_doc = """<html>
   <body>
      <p class="important">Fancy content here, just a test</p>
      <p>This is a normal paragraph</p>
      <p class="important">Another important paragraph</p>
   </body>
</html>"""

soup = BeautifulSoup(html_doc, 'html.parser')

# Find first paragraph with class 'important'
tag = soup.find('p', attrs={'class': 'important'})
print("First match:", tag)

# Find all paragraphs with class 'important'
all_tags = soup.find_all('p', class_='important')
print("All matches:", len(all_tags))
for i, tag in enumerate(all_tags):
    print(f"Match {i+1}: {tag.text}")

The output of the above code is

First match: <p class="important">Fancy content here, just a test</p>
All matches: 2
Match 1: Fancy content here, just a test
Match 2: Another important paragraph

Here, soup is the BeautifulSoup object containing the parsed HTML document. The find() method returns the first tag that matches the given criteria, while find_all() returns a list of all matching tags.

Finding Tag by ID Attribute

Example

To find a div tag with a specific ID and then locate a paragraph inside it, we can use the following code

from bs4 import BeautifulSoup

html_doc = """<html>
<body>
   <div id="header">
      <h1>Welcome to my website</h1>
      <p>All the help text needed will be in this paragraph</p>
   </div>
   <div id="content">
      <h2>Section 1</h2>
      <p>Content of section 1 goes here</p>
      <h2>Section 2</h2>
      <p>Content of section 2 goes here</p>
   </div>
</body>
</html>"""

soup = BeautifulSoup(html_doc, 'html.parser')

# Find div with id 'content'
div_tag = soup.find('div', attrs={'id': 'content'})
print("Found div:", div_tag.get('id'))

# Find first paragraph inside this div
tag = div_tag.find('p')
print("First paragraph in content div:", tag.text)

# Find all paragraphs in this div
all_paragraphs = div_tag.find_all('p')
print("Total paragraphs in content div:", len(all_paragraphs))

The output of the above code is

Found div: content
First paragraph in content div: Content of section 1 goes here
Total paragraphs in content div: 2

This example demonstrates how to first find a container element by its ID attribute, then search within that specific container for nested elements.

Finding Tags by Text Content

Example

Sometimes we need to find tags based on their text content rather than attributes

from bs4 import BeautifulSoup

html_doc = """<html>
<body>
   <h1>List of Books</h1>
   <table>
      <tr>
         <th>Title</th>
         <th>Author</th>
         <th>Price</th>
      </tr>
      <tr>
         <td><a href="book1.html">Book 1</a></td>
         <td>Author 1</td>
         <td>$10</td>
      </tr>
      <tr>
         <td><a href="book2.html">Book 2</a></td>
         <td>Author 2</td>
         <td>$15</td>
      </tr>
      <tr>
         <td><a href="book3.html">Book 3</a></td>
         <td>Author 3</td>
         <td>$20</td>
      </tr>
   </table>
</body>
</html>"""

soup = BeautifulSoup(html_doc, 'html.parser')

# Find the td tag containing "$15"
price_tag = soup.find('td', text='$15')
print("Found price tag:", price_tag.text)

# Navigate to the row containing this price
row = price_tag.find_parent('tr')
cells = row.find_all('td')

title = cells[0].find('a').text
author = cells[1].text
price = cells[2].text

print(f"Book: {title}")
print(f"Author: {author}")  
print(f"Price: {price}")

The output of the above code is

Found price tag: $15
Book: Book 2
Author: Author 2
Price: $15

This example shows how to find a tag by its text content and then navigate to related elements using parent and sibling relationships in the HTML structure.

Finding Tags with Multiple Attributes

Example

We can search for tags that match multiple attribute conditions simultaneously

from bs4 import BeautifulSoup

html_doc = """<html>
<body>
   <div class="container" id="main">Main content</div>
   <div class="container" id="sidebar">Sidebar content</div>
   <div class="footer" id="main">Footer content</div>
   <span class="container" id="main">Span content</span>
</body>
</html>"""

soup = BeautifulSoup(html_doc, 'html.parser')

# Find div with both class='container' and id='main'
tag = soup.find('div', attrs={'class': 'container', 'id': 'main'})
print("Found tag:", tag)
print("Tag name:", tag.name)
print("Text content:", tag.text)

# Find all tags with class='container' regardless of other attributes
all_containers = soup.find_all(attrs={'class': 'container'})
print(f"Total containers found: {len(all_containers)}")

The output of the above code is

Found tag: <div class="container" id="main">Main content</div>
Tag name: div
Text content: Main content
Total containers found: 3

Common Methods and Properties

When working with found tags, these methods and properties are frequently used

Method/Property Description Example Usage
tag.text Extract text content from the tag tag.text
tag.get('attr') Get the value of a specific attribute tag.get('href')
tag.name Get the tag name tag.name returns 'div', 'p', etc.
tag.find_parent() Find the parent element tag.find_parent('div')
tag.find_next_sibling() Find the next sibling element tag.find_next_sibling('p')
tag.find_previous_sibling() Find the previous sibling element tag.find_previous_sibling()

Applications

Finding tags with specific attribute values is a common web scraping task that can be used in various applications

  • Data Analysis Extracting structured data from websites for machine learning models or statistical analysis

  • E-commerce Price Monitoring Scraping product information and prices for comparison shopping applications

  • Job Market Analysis Collecting job postings from career websites to analyze market trends and salary data

  • News Aggregation Gathering news articles from multiple sources based on specific categories or topics

  • Social Media Monitoring Tracking mentions, hashtags, or specific content across social platforms

Before engaging in web scraping, always read the website's terms of service and robots.txt file, as some sites have restrictions or rate limits to prevent automated access.

Conclusion

BeautifulSoup provides powerful methods like find() and find_all() to locate HTML tags based on attributes, text content, or combinations of criteria. The find() method returns the first match, while find_all() returns all matching elements. These tools make it easy to extract specific data from HTML documents for web scraping and data analysis projects.

Updated on: 2026-03-16T21:38:54+05:30

666 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements