Skip to content Skip to sidebar Skip to footer

Python Lxml Changes Tag Hierarchy?

I'm having a small issue with lxml. I'm converting an XML doc into an HTML doc. The original XML looks like this (it looks like HTML, but it's in the XML doc):

Localizatio

Solution 1:

lxml is doing this because it doesn't store invalid HTML, and <p> elements can't be nested in HTML:

The P element represents a paragraph. It cannot contain block-level elements (including P itself).


Solution 2:

You're using lxml's HTML parser, not an XML parser. Try this instead:

>>> from lxml import etree
>>> item = '<p>Eiffel tower? Paris or Vegas <p>Bayes theorem p(A|B)</p></p>'
>>> root = etree.fromstring(item)
>>> etree.tostring(root, pretty_print=True)
'<p>Eiffel tower? Paris or Vegas <p>Bayes theorem p(A|B)</p></p>\n'

Post a Comment for "Python Lxml Changes Tag Hierarchy?"