XML parsing in Python?

PythonServer Side ProgrammingProgramming

Python XML parser parser provides one of the easiest ways to read and extract useful information from the XML file. In this short tutorial we are going to see how we can parse XML file, modify and create XML documents using python ElementTree XML API.

Python ElementTree API is one of the easiest way to extract, parse and transform XML data.

So let’s get started using python XML parser using ElementTree:

Example1

Creating XML file

First we are going to create a new XML file with an element and a sub-element.

#Import required library
import xml.etree.ElementTree as xml

def createXML(filename):
   # Start with the root element
   root = xml.Element("users")
   children1 = xml.Element("user")
   root.append(children1)

   tree = xml.ElementTree(root)
   with open(filename, "wb") as fh:
      tree.write(fh)

if __name__ == "__main__":
   createXML("testXML.xml")

Once we run above program, a new file is created named “textXML.xml” in our current default working directory:

Which contains contents something like:

<users><user /></users>

Please note while writing the file, we have used the ‘wb’ mode .i.e. write the file in binary mode.

Adding values to XML elements

Let’s give some values to the XML elements in our above program:

#Import required library
import xml.etree.ElementTree as xml

def createXML(filename):
   # Start with the root element
   root = xml.Element("users")
   children1 = xml.Element("user")
   root.append(children1)

   userId1 = xml.SubElement(children1, "Id")
   userId1.text = "hello"

   userName1 = xml.SubElement(children1, "Name")
   userName1.text = "Rajesh"

   tree = xml.ElementTree(root)
   with open(filename, "wb") as fh:
   tree.write(fh)

if __name__ == "__main__":
   createXML("testXML.xml")

After running the above program, we’ll see that new elements are added with values, something like:

<users>
   <user>
      <Id>hello</Id>
      <Name>Rajesh</Name>
   </user>
</users>

Above output looks ok.

Now let’s start editing files:

Editing XML data

Let’s add some bit of data into from a file in our existing program.

newdata.xml

<users>
   <user>
      <id>1a</id>
      <name>Rajesh</name>
      <salary>NA</salary>
   </user>
   <user>
      <id>2b</id>
      <name>TutorialsPoint</name>
      <salary>NA</salary>
   </user>
   <user>
      <id>3c</id>
      <name>Others</name>
      <salary>NA</salary>
   </user>
</users>

Above is our current xml file, let’s try to update the salary of each users:

#Import required library
import xml.etree.ElementTree as ET

def updateET(filename):
   # Start with the root element
   tree = ET.ElementTree(file=filename)
   root = tree.getroot()

   for salary in root.iter('salary'):
      salary.text = '500000'

   tree = ET.ElementTree(root)
   with open("newdata.xml", "wb") as fh:
      tree.write(fh)

if __name__ == "__main__":
   updateET("newdata.xml")

Output

So we see the salary is changed from ‘NA’ to ‘500000’.

Example: Python XML Parser

Now let’s write another program which will parse XML data present in the file and print the data.

#Import required library
import xml.etree.cElementTree as ET

def parseXML(file_name):
   # Parse XML with ElementTree
   tree = ET.ElementTree(file=file_name)
   print(tree.getroot())
   root = tree.getroot()
   print("tag=%s, attrib=%s" % (root.tag, root.attrib))

   # get the information via the children!
   print("-" * 25)
   print("Iterating using getchildren()")
   print("-" * 25)
   users = root.getchildren()
   for user in users:
      user_children = user.getchildren()
      for user_child in user_children:
         print("%s=%s" % (user_child.tag, user_child.text))

if __name__ == "__main__":
   parseXML("newdata.xml")

Output

<Element 'users' at 0x0551A5A0>
tag = users, attrib = {}
-------------------------
Iterating using getchildren()
-------------------------
id = 1a
name = Rajesh
salary = 500000
id = 2b
name = TutorialsPoint
salary = 500000
id = 3c
name = Others
salary = 500000
raja
Published on 09-Apr-2019 10:12:25
Advertisements