Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Parsing XML with DOM APIs in Python
The Document Object Model (DOM) is a cross-language API from the World Wide Web Consortium (W3C) for accessing and modifying XML documents. Unlike SAX parsers that process XML sequentially, DOM loads the entire document into memory as a tree structure, allowing random access to any element.
The DOM is extremely useful for random-access applications. SAX only allows you a view of one bit of the document at a time, while DOM provides complete access to the entire document structure simultaneously.
Basic DOM Parsing
Here is the easiest way to quickly load an XML document and create a minidom object using the xml.dom module. The minidom object provides a simple parser method that creates a DOM tree from the XML file ?
Sample XML File
First, let's create a sample XML file called movies.xml ?
<collection shelf="New Arrivals">
<movie title="Enemy Behind">
<type>War, Thriller</type>
<format>DVD</format>
<rating>PG</rating>
<description>Talk about a US-Japan war</description>
</movie>
<movie title="Transformers">
<type>Anime, Science Fiction</type>
<format>DVD</format>
<rating>R</rating>
<description>A scientific fiction</description>
</movie>
</collection>
Parsing the XML Document
The sample code calls the parse(file [,parser]) function of the minidom object to parse the XML file into a DOM tree object ?
#!/usr/bin/python
from xml.dom.minidom import parse
import xml.dom.minidom
# Open XML document using minidom parser
DOMTree = xml.dom.minidom.parse("movies.xml")
collection = DOMTree.documentElement
if collection.hasAttribute("shelf"):
print("Root element : %s" % collection.getAttribute("shelf"))
# Get all the movies in the collection
movies = collection.getElementsByTagName("movie")
# Print detail of each movie.
for movie in movies:
print("*****Movie*****")
if movie.hasAttribute("title"):
print("Title: %s" % movie.getAttribute("title"))
movie_type = movie.getElementsByTagName('type')[0]
print("Type: %s" % movie_type.childNodes[0].data)
format_elem = movie.getElementsByTagName('format')[0]
print("Format: %s" % format_elem.childNodes[0].data)
rating = movie.getElementsByTagName('rating')[0]
print("Rating: %s" % rating.childNodes[0].data)
description = movie.getElementsByTagName('description')[0]
print("Description: %s" % description.childNodes[0].data)
This would produce the following result −
Root element : New Arrivals *****Movie***** Title: Enemy Behind Type: War, Thriller Format: DVD Rating: PG Description: Talk about a US-Japan war *****Movie***** Title: Transformers Type: Anime, Science Fiction Format: DVD Rating: R Description: A scientific fiction
Key DOM Methods
The most commonly used DOM methods for XML parsing include ?
-
parse(file)− Parse an XML file and return DOM tree -
documentElement− Get the root element of the document -
getElementsByTagName(name)− Get all elements with specified tag name -
getAttribute(name)− Get attribute value by name -
hasAttribute(name)− Check if attribute exists -
childNodes[index].data− Access text content of element
Conclusion
DOM parsing loads the entire XML document into memory as a tree structure, making it ideal for applications requiring random access to XML elements. Use the xml.dom.minidom module for simple XML parsing tasks in Python.
For complete DOM API documentation, please refer to the standard Python XML Processing documentation.
