Showing posts with label elementtree. Show all posts
Showing posts with label elementtree. Show all posts

Saturday, 3 October 2009

An Alternative to ElementTree -- lxml

I use ElementTree as my XML library of choice in Python, but sometimes it is lacking in what I need it to do. I have always found its support for namespaces to be awkward to use, and recently I needed to validate generated XML against a collection of XML schemas, but ElementTree has no support for this.

So I had a hunt around and discovered lxml, a Python XML library that -- not only appears to actively developed -- provides good compatibility with ElementTree while layering on additional functionality.

The only downside to lxml is that it requires both libxml2 and libxslt to be installed, though if you install from the binary egg, it includes these libraries, making it very straightforward to run on Windows. On the Mac, you might need to upgrade the libraries, which can be a chore, though using MacPorts makes life easier.

Wednesday, 26 November 2008

Tree'd by XML

The standard python distribution comes with a plethora of XML support modules: the usual SAX and DOM parsers, a lightweight minidom, wrappers for the Expat parser and something called ElementTree. While Expat seems to get a lot of attention, I recently tried the ElementTree way of doing things and was very pleasanlty surprised.

ElementTree treats XML documents as a hierarchy of containers, which behave in much the same way as lists or dictionaries. Every node in an XML document is represented as an Element, with elements as sub-elements of their parent.

So, for example, let's declare a root with some sub-nodes:

book = Element('Book')
title = SubElement(book, 'Name')
SubElement(book, 'Title')

To turn this into a valid XML document, just use the tostring(book) function, or write it to a file with the ElementTree().write() method.

Want to add text to an element/node? Just use title.text = 'Bookishness'. To add attributes, again simplicity: title.attrib['binding'] = 'hardback'. Accessing attributes is a case of: book.get('binding').

Now, to get all pythonic on ElementTree, here's how to create a (sub)node, add attributes and text all in one line:

SubElement(book, 'Author', sex='Female', nationality='British').text = 'JK Rowling'

More concise than a Harry Potter doorstop.