Ian Bicking: the old part of his blog

XML vs. generic serialization

Sean McGrath wonders about loading XML; particularly, he notes a claim that XML serialization and deserialization (parsing) is faster than Java's object serialization of a DOM, and wonders about Python's serialization (via pickle).

If you think about it, it makes sense. First, we're talking about serializing an in-language representation of an XML document; you can serialize any object to XML (e.g.), but that's bound to be slower than the native serialization, since the native serialization (if implemented properly) is more compact.

But XML isn't hard to parse. It's quite restrictive, and a good XML parser that produces a DOM object will mostly do the work it needs to do, and no more. And you can't get a lot more compact than the XML document, unless you are performing some sort of compression (which has performance penalties of its own). If you used generic serialization, the disk representation would have to contain all sorts of metadata about Node objects and Attribute objects and whatnot. Not only is that more data to read through, but generic deserialization has to actually pay attention to that information.

This was something I played with back when trying to optimize Kata 19. In that case, I was reading /etc/words, and creating a graph based on those. I thought I could save some effort by saving a pickle of the graph that I created. But in fact it was dramatically slower -- the pickle was much larger, and merely the extra effort of reading the file outweighed the effort that went into constructing the graph. The graph itself was merely a dictionary; and marshal (which is more restrictive than pickle) wasn't significantly better.

So, I imagine it can be the same with XML -- either way you have to rebuild the objects, and XML is of course well-suited to producing an object that represents XML.

Created 21 Sep '04
Modified 14 Dec '04


I think Java is more fast than XML serialization, it's just on the Windows platform some slow-downs.
Maybe reading this article will help:

# Desco Rigato