Article Categories

Selected Reading

How to remove empty tags using BeautifulSoup in Python?

Python Tkinter GUI-Programming

BeautifulSoup is a Python library that pulls out data from HTML and XML files. Using BeautifulSoup, we can remove empty tags present in HTML or XML documents and convert the data into clean, human-readable format.

First, install the BeautifulSoup library using: pip install beautifulsoup4

Basic Example − Removing Empty Tags

Here's how to identify and remove empty tags from an HTML document ?

from bs4 import BeautifulSoup

# HTML document with empty tags
html_document = """
<html>
<body>
    <p>Python is an interpreted, high-level programming language.</p>
    <div></div>
    <span>   </span>
    <p>Python emphasizes code readability.</p>
    <strong></strong>
</body>
</html>
"""

# Create BeautifulSoup object
soup = BeautifulSoup(html_document, "html.parser")

# Remove empty tags
for tag in soup.find_all():
    if len(tag.get_text(strip=True)) == 0:
        tag.extract()

print(soup.prettify())

<html>
 <body>
  <p>
   Python is an interpreted, high-level programming language.
  </p>
  <p>
   Python emphasizes code readability.
  </p>
 </body>
</html>

Handling Tags with Only Whitespace

The strip=True parameter ensures tags containing only whitespace are also removed ?

from bs4 import BeautifulSoup

html_content = """
<div>
    <p>Valid content here</p>
    <span>     </span>
    <em></em>
    <strong>Bold text</strong>
</div>
"""

soup = BeautifulSoup(html_content, "html.parser")

print("Before removing empty tags:")
print(soup.prettify())

# Remove empty tags including whitespace-only tags
for tag in soup.find_all():
    if not tag.get_text(strip=True):
        tag.extract()

print("\nAfter removing empty tags:")
print(soup.prettify())

Before removing empty tags:
<div>
 <p>
  Valid content here
 </p>
 <span>
 </span>
 <em>
 </em>
 <strong>
  Bold text
 </strong>
</div>

After removing empty tags:
<div>
 <p>
  Valid content here
 </p>
 <strong>
  Bold text
 </strong>
</div>

Removing Specific Empty Tags

You can target specific tag types instead of all tags ?

from bs4 import BeautifulSoup

html_data = """
<html>
<body>
    <p>Content paragraph</p>
    <p></p>
    <div>Valid div</div>
    <div></div>
    <span>Text content</span>
    <span></span>
</body>
</html>
"""

soup = BeautifulSoup(html_data, "html.parser")

# Remove only empty div and p tags
for tag in soup.find_all(['div', 'p']):
    if not tag.get_text(strip=True):
        tag.extract()

print(soup.prettify())

<html>
 <body>
  <p>
   Content paragraph
  </p>
  <div>
   Valid div
  </div>
  <span>
   Text content
  </span>
  <span>
  </span>
 </body>
</html>

Key Points

get_text(strip=True) removes leading/trailing whitespace before checking if tag is empty
extract() completely removes the tag from the document
Use find_all() to iterate through all tags or specify particular tag names
The parser ("html.parser", "lxml") affects how BeautifulSoup handles the document

Conclusion

BeautifulSoup makes it easy to remove empty tags using get_text(strip=True) to identify empty content and extract() to remove unwanted tags. This helps clean up HTML documents by removing unnecessary markup.

Dev Prakash Sharma

Updated on: 2026-03-25T16:55:35+05:30

976 Views

Previous Next