I’ve been doing a bit more with Atom lately.
First, I started writing a library to manipulate Atom feeds and entries. For the moment this is located in atom.py. It uses lxml, as does everything markup related I do these days.
I came upon a revelation of sorts when I was writing the library. I first started with a library that looked like this:
class Feed(object):
def __init__(self, title, ...):
self.title = title
..
@classmethod
def parse(cls, xml):
if isinstance(xml, basestring):
xml = etree.XML(xml)
title = xml.xpath('//title').text
...
return cls(title, ...)
def serialize(self):
el = etree.Element('{%s}feed' % atom_ns)
title = etree.Element('{%s}title' % atom_ns)
title.text = self.title
el.append(title)
...
return el
Obviously there’s ways to improve this and make it less verbose, and I went down that path for a while. But then I decided the whole path was wrong. Atom is XML. It’s not the representation of some object I’m creating. If I have something that can’t be represented in XML, it isn’t Atom, and it doesn’t belong in my Atom-related objects.
So instead I started making lxml more convenient when using Atom. I don’t keep any information except what is in the markup, I just make it more convenient to access that information.
I used lots of descriptors to do this, as the same patterns happened over and over. For instance, the Feed object is fairly simple:
class Feed(AtomElement):
entries = _findall_property('entry')
author = _element_property('author')
Which basically means that feed.entries returns all <entry> elements, and feed.author returns the single author element.
There’s also accessors for text elements (like <id>) and date containing elements (like <updated>) and just to access XML attributes as Python attributes.
There’s a number of advantages:
- No hidden state.
- No deferred errors, since everything is always represented in the XML infoset.
- All XML extensions work, even though my classes don’t know anything in particular about them. There’s a full API for manipulating the XML that you can use, you don’t have to use my APIs.
- Even more obscure kinds of extensions work fine, like a custom attribute on an element. There’s absolutely zero normalization that happens.
- I only have to write the parts where the normal XML (lxml) APIs are inconvenient, so the implementation stays simple.
- There’s no confusion over which object I might be talking about in my code. There’s no distinction between the XML object and the domain object.
Since then I’ve been working on a Javascript library for handling Atom. It’s not as elegant. I am trying to keep to this same principle, but of course I can’t actually extend the DOM and so I can’t add convenience methods. So instead I’m making a class that lightly wraps the DOM objects, with explicit getters and setters that simply read and modify those DOM objects.
One thing that I have found very useful in my development on the Javascript side is doctest-style testing. You can see the test, but to run it you have to check it out (it uses some svn:externals which you don’t get through the direct svn access). After using that testing some more and being pleased with the result, I decided to package the Javascript doctest runner a bit better. I removed the framework dependencies, did a bit of renaming (now it is doctestjs or doctest.js instead of jsdoctest), wrote up fairly comprehensive docs, and uploaded it to JSAN (though at the moment the trunk from svn is probably better to use). I think it’s an excellent way of doing unit testing in Javascript, much better than any of the alternatives I’ve seen. It even has some notable advantages over Python’s doctest, like if you are using Firebug (which you must if you do Javascript development) you get a console session that runs in the same namespace as your tests, so you can easily do inspection of the objects if there’s a failure.
I’m not sure about JSAN. It’s
nice to have an index. But I think they copy
stuff from CPAN a bit too much. Why should
you have a text README file? That’s
just silly; of course Javascript
documentation should be HTML. They batch
processing. Processing one package a day
on the fly shouldn’t be overwhelming.
They want a MANIFEST file. The standard
metadata file is YAML, not JSON. This should
all be a little more Javascripty in my
opinion. But they also accept any kind of
upload, so there’s nothing stopping
you from ignoring what you don’t
care about. I’ll probably improve the
packaging of doctestjs a bit in the future,
and still ignore the parts I think are
silly.
Automatically generated list of related posts:
- Atom Publishing Protocol: Atompub Doing stuff with the Atom Publishing Protocol, I’ve noticed that...
- Atompub as an alternative to WebDAV I’ve been thinking about an import/export API for PickyWiki; I...
Did you consider using Amara instead of lxml? The built in API is pretty close to what your decorators are doing and you get things like XPath queries along with usable namespace support. Not to mention XSLT built in.
You know about [Google's Atom library for Python](http://gdata-python-client.googlecode.com/svn/trunk/src/atom/), yes?
And congrats on the WordPress migration. Is it running via wphp? ;^)
re: Amara
lxml also has the “objectify’ extension, which I believe is based on Amara’s API. Enumerating the attributes isn’t that hard, though, and it includes extra information like default values or date parsing.
lxml has good namespace support and XPath and XSLT built in.
re: Google’s library, no, I had missed it. But it does look like the style of API I specifically didn’t use, so I don’t feel so bad.
jquery’s xpath support for your atoms bits?
I’ve just been using the DOM in Javascript, i.e., getElementsByTagNameNS. With a few factory functions for getters and setters it’s not too terribly bad.
If I may, the lxml objectify feature is nowhere as good as Amara’s mapping of XML into Python objects which is basically what you try to redo here.
Amara is possibly the most powerful and user-friendly XML library for Python and yet is quite unknown.
Hi Ian,
Doctests and a standalone Atom library for javascript, marvelous!
It’s not immediately clear to me how the GPL affects unit tests. Is it kosher to distribute a BSD-script with associated doctests using a GPL doctest library?
Also, it wasn’t clear to me at first glance what license applies to https://svn.openplans.org/svn/TaggerClient/trunk/javascript/atom.js
Anyway, thanks for working on this!
Ian,
Thanks for writing doctest! I made a slight change to allow for Unix/Mac/Dos linefeed characters (http://blog.cornerstonenw.com/2009/06/02/doctestjs/). I don’t think it’s possible to contribute the patch to the doctest SVN…
– Peter Rust Developer, Cornerstone Systems