Parsing XML with DOM APIs in Python

The Document Object Model (DOM) is a cross-language API from the World Wide Web Consortium (W3C) for accessing and modifying XML documents. Unlike SAX parsers that process XML sequentially, DOM loads the entire document into memory as a tree structure, allowing random access to any element.

The DOM is extremely useful for random-access applications. SAX only allows you a view of one bit of the document at a time, while DOM provides complete access to the entire document structure simultaneously.

Basic DOM Parsing

Here is the easiest way to quickly load an XML document and create a minidom object using the xml.dom module. The minidom object provides a simple parser method that creates a DOM tree from the XML file ?

Sample XML File

First, let's create a sample XML file called movies.xml ?

<collection shelf="New Arrivals">
    <movie title="Enemy Behind">
        <type>War, Thriller</type>
        <format>DVD</format>
        <rating>PG</rating>
        <description>Talk about a US-Japan war</description>
    </movie>
    <movie title="Transformers">
        <type>Anime, Science Fiction</type>
        <format>DVD</format>
        <rating>R</rating>
        <description>A scientific fiction</description>
    </movie>
</collection>

Parsing the XML Document

The sample code calls the parse(file [,parser]) function of the minidom object to parse the XML file into a DOM tree object ?

#!/usr/bin/python
from xml.dom.minidom import parse
import xml.dom.minidom

# Open XML document using minidom parser
DOMTree = xml.dom.minidom.parse("movies.xml")
collection = DOMTree.documentElement

if collection.hasAttribute("shelf"):
    print("Root element : %s" % collection.getAttribute("shelf"))

# Get all the movies in the collection
movies = collection.getElementsByTagName("movie")

# Print detail of each movie.
for movie in movies:
    print("*****Movie*****")
    if movie.hasAttribute("title"):
        print("Title: %s" % movie.getAttribute("title"))
    
    movie_type = movie.getElementsByTagName('type')[0]
    print("Type: %s" % movie_type.childNodes[0].data)
    
    format_elem = movie.getElementsByTagName('format')[0]
    print("Format: %s" % format_elem.childNodes[0].data)
    
    rating = movie.getElementsByTagName('rating')[0]
    print("Rating: %s" % rating.childNodes[0].data)
    
    description = movie.getElementsByTagName('description')[0]
    print("Description: %s" % description.childNodes[0].data)

This would produce the following result −

Root element : New Arrivals
*****Movie*****
Title: Enemy Behind
Type: War, Thriller
Format: DVD
Rating: PG
Description: Talk about a US-Japan war
*****Movie*****
Title: Transformers
Type: Anime, Science Fiction
Format: DVD
Rating: R
Description: A scientific fiction

Key DOM Methods

The most commonly used DOM methods for XML parsing include ?

  • parse(file) − Parse an XML file and return DOM tree
  • documentElement − Get the root element of the document
  • getElementsByTagName(name) − Get all elements with specified tag name
  • getAttribute(name) − Get attribute value by name
  • hasAttribute(name) − Check if attribute exists
  • childNodes[index].data − Access text content of element

Conclusion

DOM parsing loads the entire XML document into memory as a tree structure, making it ideal for applications requiring random access to XML elements. Use the xml.dom.minidom module for simple XML parsing tasks in Python.

For complete DOM API documentation, please refer to the standard Python XML Processing documentation.

Updated on: 2026-03-25T07:54:07+05:30

2K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements