Pretty Printing XML in Python


When dealing with XML data in Python, ensuring its readability and structure can greatly enhance code comprehension and maintainability. Pretty printing XML, or formatting it with proper indentation and line breaks, is a valuable technique for achieving these goals.

In this article, we explore two different methods to pretty print XML using Python: xml.dom.minidom and xml.etree.ElementTree. By understanding these approaches, developers can effectively present XML data in an organized and visually appealing manner, facilitating easier analysis and manipulation.

How to Pretty print XML in Python?

Below are the two methods using which we can perform pretty printing in Python −

Method 1: Using xml.dom.minidom

Below are the steps that we will follow to perform pretty printing using xml.dom.minidom −

  • Import the required module: We start by importing the `xml.dom.minidom` module, which provides a lightweight implementation of the Document Object Model (DOM) API for XML parsing.

  • Define the `pretty_print_xml_minidom` function: This function takes an XML string as input and is responsible for parsing and pretty printing the XML using `xml.dom.minidom`.

  • Parse the XML string: Inside the `pretty_print_xml_minidom` function, we use the `xml.dom.minidom.parseString()` method to parse the XML string and create a DOM object.

  • Pretty print the XML: Next, we use the `toprettyxml()` method on the DOM object to generate a pretty-printed XML string. We pass the `indent` parameter with a value of `" "` to specify the desired indentation level (two spaces in this case).

  • Remove empty lines: By default, `toprettyxml()` adds empty lines in the output. To remove these empty lines, we split the pretty XML string by newlines (`\n`), remove any leading or trailing whitespace from each line, and then join the non-empty lines back together.

  • Print the pretty XML: Finally, we print the resulting pretty XML string.

Program 2: Using xml.etree.ElementTree

Below are the steps that we will be following to perform pretty printing using xml.etree.ElementTree −

  • Import the required module: We start by importing the `xml.etree.ElementTree` module, which provides a fast and efficient API for parsing and manipulating XML.

  • Define the `indent` function: This is a custom function that recursively adds indentation to the XML elements. It takes an `elem` parameter (XML element) and an optional `level` parameter to specify the current indentation level (defaulted to 0).

  • Indent the XML: Inside the `indent` function, we add indentation by modifying the `text` and `tail` attributes of the XML elements. The `text` attribute represents the text immediately after the opening tag, and the `tail` attribute represents the text immediately before the closing tag. By adding indentation to these attributes, we achieve pretty printing.

  • Define the `pretty_print_xml_elementtree` function: This function takes an XML string as input and is responsible for parsing and pretty printing the XML using `xml.etree.ElementTree`.

  • Parse the XML string: Inside the `pretty_print_xml_elementtree` function, we use the `ET.fromstring()` method to parse the XML string and create an ElementTree object.

  • Indent the XML: We call the `indent()` function on the root element of the XML to add the indentation recursively to all elements.

  • Convert the XML element back to a string: We use the `ET.tostring()` method to convert the XML element back to a string representation. We pass the `encoding` parameter with a value of `"unicode"` to ensure proper encoding of the resulting string.

  • Print the pretty XML: Finally, we print the resulting pretty XML string.

Both programs provide different approaches to achieve pretty printing of XML. The first program utilizes the DOM API provided by `xml.dom.minidom` to parse and pretty print XML, while the second program uses the `xml.etree.ElementTree` module and defines a custom function to add indentation recursively to XML elements.

Below are the program examples using the above two methods −

Program 1: Using xml.dom.minidom

import xml.dom.minidom

def pretty_print_xml_minidom(xml_string):
   # Parse the XML string
   dom = xml.dom.minidom.parseString(xml_string)

   # Pretty print the XML
   pretty_xml = dom.toprettyxml(indent="  ")

   # Remove empty lines
   pretty_xml = "\n".join(line for line in pretty_xml.split("\n") if line.strip())

   # Print the pretty XML
   print(pretty_xml)

# Example usage
xml_string = '''
<root>
  <element attribute="value">
   <subelement>Text</subelement>
  </element>
</root>
'''

pretty_print_xml_minidom(xml_string)

Output

<?xml version="1.0" ?>
<root>
  <element attribute="value">
   <subelement>Text</subelement>
  </element>
</root>

Program 2: Using xml.etree.ElementTree

import xml.etree.ElementTree as ET

def indent(elem, level=0):
   # Add indentation
   indent_size = "  "
   i = "\n" + level * indent_size
   if len(elem):
      if not elem.text or not elem.text.strip():
         elem.text = i + indent_size
      if not elem.tail or not elem.tail.strip():
         elem.tail = i
      for elem in elem:
         indent(elem, level + 1)
      if not elem.tail or not elem.tail.strip():
         elem.tail = i
   else:
      if level and (not elem.tail or not elem.tail.strip()):
         elem.tail = i

def pretty_print_xml_elementtree(xml_string):
   # Parse the XML string
   root = ET.fromstring(xml_string)

   # Indent the XML
   indent(root)

   # Convert the XML element back to a string
   pretty_xml = ET.tostring(root, encoding="unicode")

   # Print the pretty XML
   print(pretty_xml)

# Example usage
xml_string = '''
<root>
  <element attribute="value">
   <subelement>Text</subelement>
  </element>
</root>
'''

pretty_print_xml_elementtree(xml_string)

Output

<root>
  <element attribute="value">
   <subelement>Text</subelement>
  </element>
</root>

Conclusion

In cocnclusion, Pretty printing XML in Python is essential for improving the readability and structure of XML data. Whether using the xml.dom.minidom or xml.etree.ElementTree library, developers can easily format XML with proper indentation. By adopting these techniques, programmers can enhance code comprehension, simplify debugging, and promote better collaboration when working with XML data in Python projects.

Updated on: 25-Jul-2023

3K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements