Fast XML parsing using Expat in Python

PythonServer Side ProgrammingProgramming

Python allows XML data to be read and processed through its inbuilt module called expat. It is a non-validating XML parser. it creates an XML parser object and captures the attributes of its objects into various handler functions. In the below example we will see how the various handler functions can help us read the XML file as well as give the attribute values as the output data. This generated data can be used for the processing.


import xml.parsers.expat
# Capture the first element
def first_element(tag, attrs):
   print ('first element:', tag, attrs)
# Capture the last element
def last_element(tag):
   print ('last element:', tag)
# Capture the character Data
def character_value(value):
   print ('Character value:', repr(value))
parser_expat = xml.parsers.expat.ParserCreate()
parser_expat.StartElementHandler = first_element
parser_expat.EndElementHandler = last_element
parser_expat.CharacterDataHandler = character_value
parser_expat.Parse(""" <?xml version="1.0"?>
<parent student_rollno="15">
<child1 Student_name="Krishna"> Strive for progress, not perfection</child1>
<child2 student_name="vamsi"> There are no shortcuts to any place worth going</child2>
</parent>""", 1)


Running the above code gives us the following result −

first element: parent {'student_rollno': '15'}
Character value: '\n'
first element: child1 {'Student_name': 'Krishna'}
Character value: 'Strive for progress, not perfection'
last element: child1
Character value: '\n'
first element: child2 {'student_name': 'vamsi'}
Character value: ' There are no shortcuts to any place worth going'
last element: child2
Character value: '\n'
last element: parent
Published on 04-Feb-2020 05:26:25