The ElementTree XML API in Python


The Extensible Markup Language (XML) is a markup language much like HTML. It is a portable and it is useful for handling small to medium amounts of data without using any SQL database.

Python's standard library contains xml package. This package has ElementTree module. This is a simple and lightweight XML processor API.

XML is a tree like hierarchical data format. The 'ElementTree' in this module treats the whole XML document as a tree. the 'Element' class represents a single node in this tree. Reading and writing operations on XML files are done on the ElementTree level. Interactions with a single XML element and its sub-elements are done on the Element level.

To create XML file

The tree is a hierarchical structure of elements starting with root followed by other elements. Each element is created by using Element() function of this module.

import xml.etree.ElementTree as et
e=et.Element('name')

Each element is characterized by a tag and attrib attribute which is a dict object. For tree's starting element, attrib is an empty dictionary

>>> root=xml.Element('employees')
>>> root.tag
'emploees'
>>> root.attrib
{}

You may now set up one or more child elements to be added under root element. Each child may have one or more subelements. Add them using Subelement() function and define it's text attribute.

child=xml.Element("employee")
nm = xml.SubElement(child, "name")
nm.text = student.get('name')
age = xml.SubElement(child, "salary")
age.text = str(student.get('salary'))

Each child is added to root by append() function as

root.append(child)

After adding required number of child elements, construct a tree object by elementTree() function

tree = et.ElementTree(root)

The entire tree structure is written to a binary file by tree object's write() function

f = open('employees.xml', "wb")
tree.write(f)

In following example tree is constructed out of list of dictionary items. Each dictionary item holds key-value pairs describing a student data structure. The tree so constructed is written to 'myfile.xml'

import xml.etree.ElementTree as et
employees=[{'name':'aaa','age':21,'sal':5000},{'name':xyz,'age':22,'sal':6000}]
root = et.Element("employees")
for employee in employees:
child=xml.Element("employee")
root.append(child)
nm = xml.SubElement(child, "name")
nm.text = student.get('name')
age = xml.SubElement(child, "age")
age.text = str(student.get('age'))
sal=xml.SubElement(child, "sal")
sal.text=str(student.get('sal'))

tree = et.ElementTree(root)
with open('employees.xml', "wb") as fh:
tree.write(fh)

The 'myfile.xml' is stored in current working directory.

<employees><employee><name>aaa</name><age>21</age><sal>5000</sal></employee><employee><name>xyz</name><age>22</age><sal>60</sal></employee></employee>

To parse XML file

Let us now read back the 'myfile.xml' created in above example. For this purpose following functions in ElementTree module will be used

ElementTree() This function is overloaded to read the hierarchical structure of elements to a tree objects.

tree = et.ElementTree(file='students.xml')

getroot() This function returns root element of the tree

root = tree.getroot()

getchildren() This function returns the list of sub-elements one level below of an element.

children = root.getchildren()

In following example, elements and sub-elements of the 'myfile.xml' are parsed into a list of dictionary items.

import xml.etree.ElementTree as et
tree = et.ElementTree(file='employees.xml')
root = tree.getroot()
students = []
children = root.getchildren()
for child in children:
employee={}
pairs = child.getchildren()
for pair in pairs:
employee[pair.tag]=pair.text
employees.append(student)
print (employees)

Output

[{'name': 'aaa', 'age': '21', 'sal': '5000'}, {'name': 'xyz', 'age': '22', 'sal': '6000'}]

To modify XML file

We shall use iter() function of Element. It creates a tree iterator for given tag with the current element as the root. The iterator iterates over this element and all elements below it, in document (depth first) order.

Let us build iterator for all 'marks' subelements and increment text of each sal tag by 100.

import xml.etree.ElementTree as et
tree = et.ElementTree(file='students.xml')
root = tree.getroot()
for x in root.iter('sal'):
s = int (x.text)
s = s+100
x.text=str(s)
with open("employees.xml", "wb") as fh:
tree.write(fh)

Our 'employees.xml' will now be modified accordingly.

We can also use set() to update value of a certain key.

x.set(marks, str(mark))

Updated on: 30-Jul-2019

10K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements