In preparation for my PyCon talk on HTML I thought I’d do a performance comparison of several parsers and document models.
The situation is a little complex because there’s different steps in handling HTML:
- Parse the HTML
- Parse it into something (a document object)
- Serialize it
Some libraries …