Wednesday, 26 November 2008

Tree'd by XML

The standard python distribution comes with a plethora of XML support modules: the usual SAX and DOM parsers, a lightweight minidom, wrappers for the Expat parser and something called ElementTree. While Expat seems to get a lot of attention, I recently tried the ElementTree way of doing things and was very pleasanlty surprised.

ElementTree treats XML documents as a hierarchy of containers, which behave in much the same way as lists or dictionaries. Every node in an XML document is represented as an Element, with elements as sub-elements of their parent.

So, for example, let's declare a root with some sub-nodes:

book = Element('Book')
title = SubElement(book, 'Name')
SubElement(book, 'Title')

To turn this into a valid XML document, just use the tostring(book) function, or write it to a file with the ElementTree().write() method.

Want to add text to an element/node? Just use title.text = 'Bookishness'. To add attributes, again simplicity: title.attrib['binding'] = 'hardback'. Accessing attributes is a case of: book.get('binding').

Now, to get all pythonic on ElementTree, here's how to create a (sub)node, add attributes and text all in one line:

SubElement(book, 'Author', sex='Female', nationality='British').text = 'JK Rowling'

More concise than a Harry Potter doorstop.

Tuesday, 25 November 2008

Dictionary in love

Merging two dictionaries. It should be easy and it is. Just use the update() method of one of the dictionaries:

d1 = { 'a': 1, 'b': 2}
d2 = {'b': 2, 'c': 3}
d1.update(d2)

d1 is now {'a': 1, 'c': 3, 'b': 2}

But often I don't want to transform one of dictionaries, I want to create a third from the combination of the two. I could make a deep copy of one of the dictionaries and update that (badness), or go through the items and merge (more badness).

And here is where dict() comes to the rescue once again:

d3 = dict(d1, **d2)

That's it; merged -- and it's pretty damn fast too.

Hey man, nice shot .. aka filter

Filter is another of those incredibly handy built-in python functions. It takes a function and a list, and filters the elements of the list using the function:

filter(f, a)

So say we want to filter out all those pesky Nones from a list where Nones have no right to be: [1, None, 2, 3, None, 4]. We could go down the iterate through the list, creating a new list from all the non-null elements ... but that requires multiple lines of tedium.

Instead we employ the power of filter() and -- wait for it -- a lambda function!

filter(lambda x: x != None, [1, None, 2, 3, None, 4])

And presto! The Nones are no more.

Writing a dictionary like Dr. Johnson

Typically when I used create a dictionary of elements, I employed the tried and true method declaration followed by element addition:

d = {}
d['a'] = 1
d['b'] = 2
...

Now I find I favour a slightly more compact and pythonic means, using the built-in dict() function:

d = dict( a = 1, b = 2, ... )

Both render the same dictionary:

{'a': 1, 'b': 2 ... }

Last element standing

To access the last element in a list, there's no need for the long-winded list[len(list) - 1].

All that's needed is list[-1].

Monday, 24 November 2008

List of strings, or not list of strings ...

Following on from the nifty ', '.join(l) time-(and sanity-) saver, here's the answer to the Oh no! it's a list of strings and not some other random objects that I need! disaster.

The problem is simple: I have a list of non-string objects, but the operation I want to perform on them requires a list of strings. Naively I break out the dreaded conversion loop:

num_list = [1, 2, 3, 4]
str_list = []
for item in num_list:
str_list.append( str(item) )

Bah! Three lines of code! Whatta waste!

Python to the rescue with the line-saving map function:

map(str, num_list)

And we're done.

The map function takes a function and a list, and applies that function to each element of the list, so it is useful for much more than just string magic.

So, to print the list of ints above, separated by commas:

', '.join( map(str, num_list) )

Printing comma-separated lists

If there's one thing that really bugged me when forced to write code (well, there are many, many things that bug me, but this is high on the list), it's printing out a comma-separated list, and not have that last pesky comma ruin everything at the end.

So then there's all this fuss about checking whether the item is last in the list and not slot in a comma, but oh wait! what if the list is empty, blah blah blah.

Python makes it easy -- use a join:

', '.join(list)

Done; finished; end of story.

I love Python ...

Living in a Pythonic World

No matter how long you've been programming Python, there are always wonderful new Pythonic ways of doing things that drop into your lap and say Hey! Use me! I'm so much more interesting and svelt!

My biggest problem is remembering all these weird and wacky ways of doing things. I sometimes think Hey, I remember there's a cool one-liner for doing this! and then spend hours on the net sifting through news groups and forums in search of the one true Pythonic path.

This is an attempt to catch all those little gems of Python knowledge in one place, so I never have to sift for the same thing twice.