How to remove empty tags using BeautifulSoup in Python?

BeautifulSoup is a Python library that pulls out data from HTML and XML files. Using BeautifulSoup, we can remove empty tags present in HTML or XML documents and convert the data into clean, human-readable format.

First, install the BeautifulSoup library using: pip install beautifulsoup4

Basic Example − Removing Empty Tags

Here's how to identify and remove empty tags from an HTML document ?

from bs4 import BeautifulSoup

# HTML document with empty tags
html_document = """
<html>
<body>
    <p>Python is an interpreted, high-level programming language.</p>
    <div></div>
    <span>   </span>
    <p>Python emphasizes code readability.</p>
    <strong></strong>
</body>
</html>
"""

# Create BeautifulSoup object
soup = BeautifulSoup(html_document, "html.parser")

# Remove empty tags
for tag in soup.find_all():
    if len(tag.get_text(strip=True)) == 0:
        tag.extract()

print(soup.prettify())
<html>
 <body>
  <p>
   Python is an interpreted, high-level programming language.
  </p>
  <p>
   Python emphasizes code readability.
  </p>
 </body>
</html>

Handling Tags with Only Whitespace

The strip=True parameter ensures tags containing only whitespace are also removed ?

from bs4 import BeautifulSoup

html_content = """
<div>
    <p>Valid content here</p>
    <span>     </span>
    <em></em>
    <strong>Bold text</strong>
</div>
"""

soup = BeautifulSoup(html_content, "html.parser")

print("Before removing empty tags:")
print(soup.prettify())

# Remove empty tags including whitespace-only tags
for tag in soup.find_all():
    if not tag.get_text(strip=True):
        tag.extract()

print("\nAfter removing empty tags:")
print(soup.prettify())
Before removing empty tags:
<div>
 <p>
  Valid content here
 </p>
 <span>
 </span>
 <em>
 </em>
 <strong>
  Bold text
 </strong>
</div>

After removing empty tags:
<div>
 <p>
  Valid content here
 </p>
 <strong>
  Bold text
 </strong>
</div>

Removing Specific Empty Tags

You can target specific tag types instead of all tags ?

from bs4 import BeautifulSoup

html_data = """
<html>
<body>
    <p>Content paragraph</p>
    <p></p>
    <div>Valid div</div>
    <div></div>
    <span>Text content</span>
    <span></span>
</body>
</html>
"""

soup = BeautifulSoup(html_data, "html.parser")

# Remove only empty div and p tags
for tag in soup.find_all(['div', 'p']):
    if not tag.get_text(strip=True):
        tag.extract()

print(soup.prettify())
<html>
 <body>
  <p>
   Content paragraph
  </p>
  <div>
   Valid div
  </div>
  <span>
   Text content
  </span>
  <span>
  </span>
 </body>
</html>

Key Points

  • get_text(strip=True) removes leading/trailing whitespace before checking if tag is empty
  • extract() completely removes the tag from the document
  • Use find_all() to iterate through all tags or specify particular tag names
  • The parser ("html.parser", "lxml") affects how BeautifulSoup handles the document

Conclusion

BeautifulSoup makes it easy to remove empty tags using get_text(strip=True) to identify empty content and extract() to remove unwanted tags. This helps clean up HTML documents by removing unnecessary markup.

Updated on: 2026-03-25T16:55:35+05:30

944 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements