Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Pretty Printing XML in Python
When dealing with XML data in Python, ensuring its readability and structure can greatly enhance code comprehension and maintainability. Pretty printing XML, or formatting it with proper indentation and line breaks, is a valuable technique for achieving these goals.
In this article, we explore two different methods to pretty print XML using Python: xml.dom.minidom and xml.etree.ElementTree. By understanding these approaches, developers can effectively present XML data in an organized and visually appealing manner, facilitating easier analysis and manipulation.
Method 1: Using xml.dom.minidom
The xml.dom.minidom module provides a lightweight DOM implementation that makes pretty printing straightforward with its built-in toprettyxml() method.
import xml.dom.minidom
def pretty_print_xml_minidom(xml_string):
# Parse the XML string
dom = xml.dom.minidom.parseString(xml_string)
# Pretty print the XML with 2-space indentation
pretty_xml = dom.toprettyxml(indent=" ")
# Remove empty lines that toprettyxml() adds
pretty_xml = "\n".join(line for line in pretty_xml.split("\n") if line.strip())
return pretty_xml
# Example XML string
xml_string = '<root><person id="1"><name>John</name><age>30</age></person><person id="2"><name>Jane</name><age>25</age></person></root>'
result = pretty_print_xml_minidom(xml_string)
print(result)
<?xml version="1.0" ?>
<root>
<person id="1">
<name>John</name>
<age>30</age>
</person>
<person id="2">
<name>Jane</name>
<age>25</age>
</person>
</root>
Method 2: Using xml.etree.ElementTree
The xml.etree.ElementTree module requires a custom indentation function but offers more control over the formatting process.
import xml.etree.ElementTree as ET
def indent(elem, level=0):
"""Recursively add indentation to XML elements"""
indent_size = " "
i = "\n" + level * indent_size
if len(elem):
if not elem.text or not elem.text.strip():
elem.text = i + indent_size
if not elem.tail or not elem.tail.strip():
elem.tail = i
for child in elem:
indent(child, level + 1)
if not child.tail or not child.tail.strip():
child.tail = i
else:
if level and (not elem.tail or not elem.tail.strip()):
elem.tail = i
def pretty_print_xml_elementtree(xml_string):
# Parse the XML string
root = ET.fromstring(xml_string)
# Add indentation
indent(root)
# Convert back to string
pretty_xml = ET.tostring(root, encoding="unicode")
return pretty_xml
# Example XML string
xml_string = '<root><person id="1"><name>John</name><age>30</age></person><person id="2"><name>Jane</name><age>25</age></person></root>'
result = pretty_print_xml_elementtree(xml_string)
print(result)
<root>
<person id="1">
<name>John</name>
<age>30</age>
</person>
<person id="2">
<name>Jane</name>
<age>25</age>
</person>
</root>
Comparison
| Method | Pros | Cons | Best For |
|---|---|---|---|
xml.dom.minidom |
Built-in pretty printing, XML declaration included | Adds empty lines, higher memory usage | Quick formatting with minimal code |
xml.etree.ElementTree |
More control, memory efficient | Requires custom indentation function | Large XML files, custom formatting needs |
Conclusion
Pretty printing XML in Python is essential for improving readability and debugging. Use xml.dom.minidom for simple cases with its built-in toprettyxml() method, or choose xml.etree.ElementTree when you need more control over the formatting process.
