How to parse XML documents in python
Write a note on XML. Design python program to retrieve a node present in the XML tree. (08 Marks)
This question was asked in Python Application Programming 15CS664 Jan 2019 question paper for 8 Marks.
Solution:
Video Tutorial – How to parse XML documents in python
eXtensible Markup Language – XML
XML document looks very similar to HTML (HyperText Markup Language), but XML is more structured than HTML document. XML is used to transfer data in standard form from one machine to another machine.
Each pair of opening and closing tags represents a element of the XML document. In this case <person> and </person> one element.
Each element can have some text or attributes (e.g., hide), and can have other nested elements. The nested elements are called child elements and the closing element is called parent elements.
If an XML element has no content, then the element may be indicated with self-closing tag (e.g., <email />).
Here the person is the root tag. It is called root tag as it appears first. The name, mobile, phone, and email are called as child elements. Phone and email are empty elements hence it is depicted using the self-closing element. Email element has one attribute called hide.
Here is a sample example of XML document:
<person> <name> Mahesh </name> <phone> +91 9989898989 </phone> <email/> </person>
Another Example with attributes:
<person> <name>Mahesh</name> <phone type="mobile"> +91 9989898989 </phone> <email hide="yes"/> </person>
Unlike HTML tags, Tags in XML identify the type of data and are used to store and organize the different type of data, rather than specifying how to display it, which are used to display the data.
XML document looks like a tree structure where there is a top tag person acts as the root of the tree and other tags such as phone are drawn as children of their parent nodes.
Example to store multiple person information
<persons> <person> <name>Mahesh</name> <phone type="mobile"> +91 9989898989 </phone> <email hide="yes"/> </person> <person> <name>Rahul</name> <phone type="landline"> +91 9989898989 </phone> <email hide="yes"/> </person> <person> <name>Ram</name> <phone type="mobile"> +91 9989898989 </phone> <email hide="yes"/> </person> </persons>
Parsing XML
Here is a simple example application program in python that parses some XML documents and extracts the value of data elements (tag) from the XML. This program Python program to retrieve a node present in the XML tree.
import xml.etree.ElementTree as ET data = ''' <person> <name>Mahesh</name> <phone type="mobile">+91 7411043272</phone> <email hide="yes"/> </person>''' tree = ET.fromstring(data) print('Name:', tree.find('name').text) print('Mobile No:', tree.find('phone').text) print('Email ID:', tree.find('email').text) print('Attr:', tree.find('phone').get('type')) print('Attr:', tree.find('email').get('hide'))
Output:
Name: Mahesh Mobile No: +91 7411043272 Email ID: None Attr: mobile Attr: yes
xml.etree.ElementTree is used to parse the XML document. It has a function called fromstring, which takes XML document as input and converts it into the string representation of the XML into a “tree” of XML nodes.
When the XML is in a tree, we have a series of methods we can call to extract portions of data from the XML. find function is used to extract the value of a tag. In the above example tree.find(‘phone’).text returns the phone number.
Looping through elements or nodes of XML document
import xml.etree.ElementTree as ET input = ''' <persons> <person> <name>Mahesh</name> <phone type="mobile">+91 7411043272</phone> <email hide="yes"/> </person> <person> <name>Rahul</name> <phone type="mobile">+91 7411043272</phone> <email hide="no">xyz@abc.com</email> </person> </persons>''' persons = ET.fromstring(input) lst = persons.findall('person') print('User count:', len(lst)) for p in lst: print ('----------------') print('Name:', p.find('name').text) print('Mobile No:', p.find('phone').text) print('Email ID:', p.find('email').text) print('Attr:', p.find('phone').get('type')) print('Attr:', p.find('email').get('hide'))
Output:
User count: 2 ---------------- Name: Mahesh Mobile No: +91 7411043272 Email ID: None Attr: mobile Attr: yes ---------------- Name: Rahul Mobile No: +91 7411043272 Email ID: xyz@abc.com Attr: mobile Attr: no
Clik here to read Solution to Python Application Programming Question Paper Jan 2019 15CS664
If you like the post share it with your friends. For regular updates on VTU CBCS Notes, Question Papers, interview study material, python programs, etc, do like our Facebook page.