How to get specific nodes in xml file in Python?



XML is abbrivated as Extensible Markup Language which is used format to represent a structured data. It's is useful when we are exchanging information between two systems. In Python, the xml.etree.ElementTree module helps us to read and work with XML data. In this article, we will explore how to extract specific parts of an XML file using this library.

Introduction to XML and ElementTree

When we are working with XML files in Python, the xml.etree.ElementTree module is used for parsing and navigating XML structures. It reads an XML document and builds a tree of elements from it by allowing us to easily access and manipulate individual parts of the XML document.

Each element in the XML is represented as a node in the tree and we can go through the hierarchy just like navigating folders in a directory. ElementTree is used to perform the tasks such as extracting specific values, updating nodes or searching for elements by tag or attribute.

Parsing an XML File

Parsing is the process of reading structured data such as XML, JSON or HTML and converting it into a format that a programming language can work with. In other words, when we say parse an XML file that means we're telling Python to open the file, examine its structure and turn it into objects like elements and attributes that our code can access and manipulate.

To work with a XML file, first let's create a sample XML file with filename sample.xml which contains information about books.

<library>
  <book id="1">
    <title>Python Programming</title>
    <author>John Doe</author>
    <genre>Computer Science</genre>
  </book>
  <book>
    <title>Data Science Handbook</title>
    <author>Jane Smith</author>
    <genre>Data Science</genre>
  </book id = "2">
</library>

Following is the program to load the above XML file and access the contents of the file by using the method parse() and getroot() -

import xml.etree.ElementTree as ET

tree = ET.parse('sample.xml')
root = tree.getroot()

Accessing Specific Elements

Once, when we completed loading the XML file and accessed the root element then we can start with accessing a specific part of the XML structure. The find()method in Python is used when we want to get the first matching child element. For example, if we want to extract the title of the first book in our sample XML file, then we access it by using the following program -

first_book = root.find('book')
print(first_book.find('title').text)

Following is the output of the above program -

Python Programming

Filtering Nodes with Specific Attributes

In some XML files, elements contains additional information which stored as attributes. These attributes are used to identify and filter the elements we want and this can be performed by using the findall() method in Python.

Following is the example, in which we get the book by its id from the sample.xml file -

import xml.etree.ElementTree as ET

tree = ET.parse('data.xml')
root = tree.getroot()

# Find all book elements with id="2"
books = root.findall(".//book[@id='2']")

for book in books:
    print(book.find('title').text)

Here is the output of the above program -

Data Science Handbook

Selecting Nodes by Tag Name

When we are processing XML file, sometimes we need to extract all elements which have the same tag name such as all <title> tags in a file. This can be achieved using the iter() method in Python which helps in looping through all matching elements within the XML tree.

Here is the example, in which we select and print all the book titles from the sample.xml file using the tag name title -

import xml.etree.ElementTree as ET

tree = ET.parse('data.xml')
root = tree.getroot()

# Iterate through all title elements
for title in root.iter('title'):
    print(title.text)

Below is the output of the above program -

Python Programming
Data Science Handbook
Updated on: 2025-09-01T14:55:03+05:30

10K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements