Skip to content Skip to sidebar Skip to footer

How Do I Convert Malformed HTML To PDF With IText And XMLWorker?

I am trying to convert HTML(with external CSS) into PDF using Itext XMLWorkerHelper, am facing the run-time exception whenever XMLWorkerHelper parses a malformed HTML. For example

Solution 1:

It's a bit unclear whether you've decided to use iText7 or iTextSharp (5.x.x), but here's a simple example of the latter using HtmlAgilityPack to clean up malformed HTML:

var malformedHtml = @"
<h1>Malformed HTML</h1>
<p>A paragraph <b><span>with improperly nested tags</b></span></p><hr>
<table><tr><td>Cell 1, row 1</td><td>Cell 1, row 2";
HtmlDocument h = new HtmlDocument()
{
    OptionFixNestedTags = true, OptionWriteEmptyNodes = true
};
h.LoadHtml(malformedHtml);

string css = @"
h1 { font-size:1.4em; }
hr { margin-top: 4em; margin-bottom: 2em; color: #ddd; }
table { border-collapse: collapse; }
table, td { border: 1px solid black; }
td { padding: 4px; }
span { color: red; }";

using (var stream = new MemoryStream())
{
    using (var document = new Document())
    {
        PdfWriter writer = PdfWriter.GetInstance(document, stream);
        document.Open();
        using (var htmlStream = new MemoryStream(Encoding.UTF8.GetBytes(h.DocumentNode.WriteTo())))
        {
            using (var cssStream = new MemoryStream(Encoding.UTF8.GetBytes(css)))
            {
                XMLWorkerHelper.GetInstance().ParseXHtml(writer, document, htmlStream, cssStream);
            }
        }
    }
    File.WriteAllBytes(OUTPUT, stream.ToArray());
}

PDF output:

enter image description here


Solution 2:

And if you are free to choose your particular iText flavour, please go with iText7 and pdfHTML. It supercedes XMLWorker, supports a wider range of tags and CSS3.0.


Post a Comment for "How Do I Convert Malformed HTML To PDF With IText And XMLWorker?"