Minor gotcha with Liza Daly's fast_iter

February 01, 2012

http://www.ibm.com/developerworks/xml/library/x-hiperfparse

This is a really useful article about “High-performance XML parsing in Python with lxml”.

But there’s one gotcha I’ve discovered with the fast_iter() method described therein. Can’t comment on that website without getting yet another developer ID, so I’ll describe the issue here and hope that Google picks it up.

The problem comes with nested tags with the same name.

For example:

<product> <data> <product>Some useful data</product> </data> </product>

As far as I can see, the internal “product” will be cleared before the external one is handled, leaving with you with an empty tag.