Article Categories

Selected Reading

Extracting an attribute value with beautiful soup in Python

Python Server Side Programming Programming

To extract an attribute value with the help of Beautiful Soup we need to parse the HTML document and then extract the required attribute value. Beautiful Soup is a Python library that is used to parse HTML and XML documents. BeautifulSoup provides several methods to search and navigate the parse tree, making it easy to extract data from the documents. In this article, we will extract attribute values with the help of Beautiful Soup in Python.

Algorithm

You can follow the below-given algorithm to extract attribute values with beautiful soup in Python:

Parse the HTML document using the BeautifulSoup class from the bs4 library.
Find the HTML element that contains the attribute you want to extract using the appropriate BeautifulSoup method, such as find() or find_all().
Check if the attribute exists on the element using a conditional statement or the has_attr() method.
If the attribute exists, extract its value using square brackets ([]) and the attribute name as the key.
If the attribute does not exist, handle the error appropriately.

Installing Beautiful Soup

Before using the Beautiful Soup library you need to install it using Python Package Manager i.e pip command. To install Beautiful Soup type the following command in your terminal or command prompt:

pip install beautifulsoup4

Using find() Method with Square Brackets

To extract an attribute value from an HTML tag, we first need to parse the HTML document using BeautifulSoup. In the below example, we first created an HTML doc and pass it as a string to the Beautiful Soup Constructor with the parser type html.parser. Next, we find the 'a' tag using the find() method of the soup object. This returns the first occurrence of the 'a' tag in the HTML document. Finally, we extract the value of the href attribute from the 'a' tag using square brackets notation ?

from bs4 import BeautifulSoup

# Parse the HTML document
html_doc = """
<html>
<body>
   <a href="https://www.google.com">Google</a>
</body>
</html>
"""

soup = BeautifulSoup(html_doc, 'html.parser')

# Find the 'a' tag
a_tag = soup.find('a')

# Extract the value of the 'href' attribute
href_value = a_tag['href']

print(href_value)

https://www.google.com

Using attrs to Find Elements with Specific Attributes

In the below example, we use the find_all() method to find all 'a' tags that have an href attribute. The attrs parameter is used to specify the attribute we are looking for. The {'href': True} specifies that we are looking for elements that have an href attribute with any value ?

from bs4 import BeautifulSoup

# Parse the HTML document
html_doc = """
<html>
<body>
   <a href="https://www.google.com">Google</a>
   <a href="https://www.python.org">Python</a>
   <a>No Href</a>
</body>
</html>
"""

soup = BeautifulSoup(html_doc, 'html.parser')

# Find all 'a' tags with an 'href' attribute
a_tags_with_href = soup.find_all('a', attrs={'href': True})
for tag in a_tags_with_href:
    print(tag['href'])

https://www.google.com
https://www.python.org

Using find_all() to Find All Occurrences

Sometimes, you may want to find all occurrences of an HTML element on a web page. You can use the find_all() method to achieve this. In the below example, we use the find_all() method to find all div tags with the class container. We then loop through each div tag and find the h1 and p tags inside it ?

from bs4 import BeautifulSoup

# Parse the HTML document
html_doc = """
<html>
<body>
   <div class="container">
      <h1>Heading 1</h1>
      <p>Paragraph 1</p>
   </div>
   <div class="container">
      <h1>Heading 2</h1>
      <p>Paragraph 2</p>
   </div>
</body>
</html>
"""

soup = BeautifulSoup(html_doc, 'html.parser')

# Find all 'div' tags with class='container'
div_tags = soup.find_all('div', class_='container')
for div in div_tags:
    h1 = div.find('h1')
    p = div.find('p')
    print(h1.text, p.text)

Heading 1 Paragraph 1
Heading 2 Paragraph 2

Using select() with CSS Selectors

In the below example, we use the select() method to find all h1 tags inside a div tag with the class container. The CSS selector 'div.container h1' is used to achieve this. The . is used to denote a class name, while the space is used to denote a descendant selector ?

from bs4 import BeautifulSoup

# Parse the HTML document
html_doc = """
<html>
<body>
   <div class="container">
      <h1>Heading 1</h1>
      <p>Paragraph 1</p>
   </div>
   <div class="container">
      <h1>Heading 2</h1>
      <p>Paragraph 2</p>
   </div>
</body>
</html>
"""

soup = BeautifulSoup(html_doc, 'html.parser')

# Find all 'h1' tags inside a 'div' tag with class='container'
h1_tags = soup.select('div.container h1')
for h1 in h1_tags:
    print(h1.text)

Heading 1
Heading 2

Handling Missing Attributes Safely

When extracting attributes, it's important to handle cases where the attribute might not exist. Use the get() method or has_attr() to avoid errors ?

from bs4 import BeautifulSoup

html_doc = """
<html>
<body>
   <a href="https://www.google.com">Google</a>
   <a>No Href</a>
</body>
</html>
"""

soup = BeautifulSoup(html_doc, 'html.parser')
a_tags = soup.find_all('a')

for tag in a_tags:
    # Safe method using get()
    href = tag.get('href', 'No href attribute')
    print(f"Link: {href}")
    
    # Alternative using has_attr()
    if tag.has_attr('href'):
        print(f"Has href: {tag['href']}")
    else:
        print("No href attribute found")

Link: https://www.google.com
Has href: https://www.google.com
Link: No href attribute
No href attribute found

Conclusion

Beautiful Soup provides multiple methods to extract attribute values from HTML documents. Use find() for single elements, find_all() for multiple elements, and select() for CSS-based selection. Always handle missing attributes safely using get() or has_attr() methods to avoid errors.

Rohan Singh

Updated on: 2026-03-27T07:16:20+05:30

10K+ Views

Previous Next