Python Lxml Changes Tag Hierarchy?
I'm having a small issue with lxml. I'm converting an XML doc into an HTML doc. The original XML looks like this (it looks like HTML, but it's in the XML doc):
Localizatio
Solution 1:
lxml is doing this because it doesn't store invalid HTML, and <p>
elements can't be nested in HTML:
The P element represents a paragraph. It cannot contain block-level elements (including P itself).
Solution 2:
You're using lxml's HTML parser, not an XML parser. Try this instead:
>>> from lxml import etree
>>> item = '<p>Eiffel tower? Paris or Vegas <p>Bayes theorem p(A|B)</p></p>'
>>> root = etree.fromstring(item)
>>> etree.tostring(root, pretty_print=True)
'<p>Eiffel tower? Paris or Vegas <p>Bayes theorem p(A|B)</p></p>\n'
Post a Comment for "Python Lxml Changes Tag Hierarchy?"