How to get specific nodes in xml file in Python?

XML is abbreviated as Extensible Markup Language which is a format used to represent structured data. It's useful when exchanging information between systems. In Python, the xml.etree.ElementTree module helps us read and work with XML data. In this article, we will explore how to extract specific nodes from an XML file using this library.

Introduction to XML and ElementTree

When working with XML files in Python, the xml.etree.ElementTree module is used for parsing and navigating XML structures. It reads an XML document and builds a tree of elements, allowing us to easily access and manipulate individual parts of the XML document.

Each element in the XML is represented as a node in the tree, and we can navigate through the hierarchy like browsing folders in a directory. ElementTree performs tasks such as extracting specific values, updating nodes, or searching for elements by tag or attribute.

Creating Sample XML File

First, let's create a sample XML file named sample.xml containing information about books ?

<library>
  <book id="1">
    <title>Python Programming</title>
    <author>John Doe</author>
    <genre>Computer Science</genre>
  </book>
  <book id="2">
    <title>Data Science Handbook</title>
    <author>Jane Smith</author>
    <genre>Data Science</genre>
  </book>
</library>

Parsing an XML File

Parsing is the process of reading structured data and converting it into a format that Python can work with. To parse an XML file, we use the parse() and getroot() methods ?

import xml.etree.ElementTree as ET

# Create sample XML content
xml_content = """<library>
  <book id="1">
    <title>Python Programming</title>
    <author>John Doe</author>
    <genre>Computer Science</genre>
  </book>
  <book id="2">
    <title>Data Science Handbook</title>
    <author>Jane Smith</author>
    <genre>Data Science</genre>
  </book>
</library>"""

# Parse the XML
root = ET.fromstring(xml_content)
print("Root element:", root.tag)
Root element: library

Accessing Specific Elements

Once we load the XML file and access the root element, we can start accessing specific parts of the XML structure. The find() method returns the first matching child element ?

import xml.etree.ElementTree as ET

xml_content = """<library>
  <book id="1">
    <title>Python Programming</title>
    <author>John Doe</author>
    <genre>Computer Science</genre>
  </book>
  <book id="2">
    <title>Data Science Handbook</title>
    <author>Jane Smith</author>
    <genre>Data Science</genre>
  </book>
</library>"""

root = ET.fromstring(xml_content)

# Get the first book element
first_book = root.find('book')
print("First book title:", first_book.find('title').text)
print("First book author:", first_book.find('author').text)
First book title: Python Programming
First book author: John Doe

Filtering Nodes with Specific Attributes

XML elements often contain additional information stored as attributes. We can filter elements by attributes using the findall() method with XPath expressions ?

import xml.etree.ElementTree as ET

xml_content = """<library>
  <book id="1">
    <title>Python Programming</title>
    <author>John Doe</author>
    <genre>Computer Science</genre>
  </book>
  <book id="2">
    <title>Data Science Handbook</title>
    <author>Jane Smith</author>
    <genre>Data Science</genre>
  </book>
</library>"""

root = ET.fromstring(xml_content)

# Find book with specific id
book_with_id2 = root.find(".//book[@id='2']")
if book_with_id2 is not None:
    print("Book with id='2':", book_with_id2.find('title').text)

# Find all books with any id attribute
books_with_id = root.findall(".//book[@id]")
print("Number of books with id attribute:", len(books_with_id))
Book with id='2': Data Science Handbook
Number of books with id attribute: 2

Selecting Nodes by Tag Name

When processing XML files, we often need to extract all elements with the same tag name. The iter() method helps loop through all matching elements within the XML tree ?

import xml.etree.ElementTree as ET

xml_content = """<library>
  <book id="1">
    <title>Python Programming</title>
    <author>John Doe</author>
    <genre>Computer Science</genre>
  </book>
  <book id="2">
    <title>Data Science Handbook</title>
    <author>Jane Smith</author>
    <genre>Data Science</genre>
  </book>
</library>"""

root = ET.fromstring(xml_content)

# Iterate through all title elements
print("All book titles:")
for title in root.iter('title'):
    print("- " + title.text)

print("\nAll authors:")
for author in root.iter('author'):
    print("- " + author.text)
All book titles:
- Python Programming
- Data Science Handbook

All authors:
- John Doe
- Jane Smith

Common Methods Summary

Method Purpose Returns
find() Find first matching element Single element or None
findall() Find all matching elements List of elements
iter() Iterate through all matching elements Iterator

Conclusion

The xml.etree.ElementTree module provides powerful methods to extract specific nodes from XML files. Use find() for single elements, findall() with XPath for attribute filtering, and iter() for collecting all elements by tag name.

Updated on: 2026-03-24T18:33:01+05:30

10K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements