How to parse XML and count instances of a particular node attribute in Python?

Parsing XML and counting instances of a particular node attribute in Python can be achieved through various methods. XML is a widely used format for storing and exchanging structured data. Python provides several libraries for parsing XML, including ElementTree, lxml, and xml.etree.ElementTree.

In this article, we will explore different approaches to parse XML and count instances of a particular node attribute using available XML parsing libraries with practical examples.

Using ElementTree

ElementTree is part of Python's standard library and provides a straightforward method for parsing and manipulating XML documents. It offers a lightweight API for parsing XML data into a tree structure.

Syntax

import xml.etree.ElementTree as ET

def count_node_attribute(xml_data, node_name, attr_name):
    root = ET.fromstring(xml_data)
    count = 0
    for element in root.iter(node_name):
        if attr_name in element.attrib:
            count += 1
    return count

Example

The following example counts all item elements that have a name attribute ?

import xml.etree.ElementTree as ET

def count_node_attribute(xml_data, node_name, attr_name):
    root = ET.fromstring(xml_data)
    count = 0
    for element in root.iter(node_name):
        if attr_name in element.attrib:
            count += 1
    return count

# Sample XML data
xml_data = """
<root>
  <item name="Product A" id="1" />
  <item name="Product B" id="2" />
  <item id="3" />
  <item name="Product C" id="4" />
</root>
"""

count = count_node_attribute(xml_data, "item", "name")
print(f"Items with 'name' attribute: {count}")
Items with 'name' attribute: 3

Using lxml Library

The lxml library is a third-party library that provides more extensive functionality than ElementTree, including support for XPath, XSLT, and XML Schema validation.

Example

This example demonstrates the same functionality using lxml ?

from xml.etree import ElementTree as ET

def count_with_lxml_style(xml_data, node_name, attr_name):
    root = ET.fromstring(xml_data)
    count = 0
    for element in root.iter(node_name):
        if attr_name in element.attrib:
            count += 1
    return count

# Sample XML data
xml_data = """
<catalog>
  <book isbn="123" title="Python Guide" />
  <book title="XML Processing" />
  <book isbn="456" title="Data Analysis" />
  <book isbn="789" />
</catalog>
"""

count = count_with_lxml_style(xml_data, "book", "isbn")
print(f"Books with ISBN: {count}")
Books with ISBN: 3

Using XPath-Style Selection

For more complex queries, we can simulate XPath-like functionality using ElementTree's findall() method with attribute filtering ?

import xml.etree.ElementTree as ET

def count_with_xpath_style(xml_data, xpath_expression):
    root = ET.fromstring(xml_data)
    elements = root.findall(xpath_expression)
    return len(elements)

# Sample XML data
xml_data = """
<store>
  <product category="electronics" name="Laptop" />
  <product category="books" name="Novel" />
  <product name="Phone" />
  <product category="electronics" name="Tablet" />
</store>
"""

# Count products with category attribute
count = count_with_xpath_style(xml_data, ".//product[@category]")
print(f"Products with category: {count}")
Products with category: 3

Comparison

Method Library Required Performance Best For
ElementTree iter() Built-in Good Simple attribute counting
ElementTree findall() Built-in Better XPath-like queries
lxml XPath Third-party Best Complex XML queries

Practical Use Case

Here's a complete example that handles different scenarios ?

import xml.etree.ElementTree as ET

def analyze_xml_attributes(xml_data):
    root = ET.fromstring(xml_data)
    
    # Count all elements with any attributes
    total_with_attrs = sum(1 for elem in root.iter() if elem.attrib)
    
    # Count specific node with specific attribute
    items_with_id = sum(1 for item in root.iter('item') if 'id' in item.attrib)
    
    # Count by attribute value
    active_items = sum(1 for item in root.iter('item') 
                      if item.get('status') == 'active')
    
    return {
        'total_with_attributes': total_with_attrs,
        'items_with_id': items_with_id,
        'active_items': active_items
    }

# Complex XML example
xml_data = """
<inventory>
  <item id="1" status="active" name="Widget A" />
  <item id="2" status="inactive" name="Widget B" />
  <item id="3" status="active" />
  <item name="Widget D" />
  <category name="Electronics" />
</inventory>
"""

results = analyze_xml_attributes(xml_data)
for key, value in results.items():
    print(f"{key}: {value}")
total_with_attributes: 5
items_with_id: 3
active_items: 2

Conclusion

Python offers multiple approaches for parsing XML and counting node attributes. ElementTree provides a simple, built-in solution for basic tasks, while lxml offers advanced features for complex XML processing. Choose the method that best fits your project's requirements and complexity.

Updated on: 2026-03-27T14:10:29+05:30

1K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements