How To Parse Html From Javafx Webview And Transfer This Data To Jsoup Document?
Solution 1:
WebView browser = new WebView();
WebEngine webEngine = browser.getEngine();
String url = "https://docs.microsoft.com/en-us/ef/ef6/";
webEngine.load(url);
//get w3c document from webEngine
org.w3c.dom.Document w3cDocument = webEngine.getDocument();
// use jsoup helper methods to convert it to string
String html = new org.jsoup.helper.W3CDom().asString(webEngine.get);
// create jsoup document by parsing html
Document doc = Jsoup.parse(url, html);
Solution 2:
I can't promise this is the best way as I've not used Jsoup before and I'm not an expert on the XML API.
The org.jsoup.Jsoup
class has a method for parsing HTML in String
form: Jsoup.parse(String)
. This means we need to get the HTML from the WebView
as a String
. The WebEngine
class has a document
property that holds a org.w3c.dom.Document
. This Document
is the HTML content of the currently showing web page. We just need to convert this Document
into a String
, which we can do with a Transformer
.
import java.io.StringWriter;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.jsoup.Jsoup;
public class Utils {
private static Transformer transformer;
// not thread safe
public static org.jsoup.nodes.Document convert(org.w3c.dom.Document doc)
throws TransformerException {
if (transformer == null) {
transformer = TransformerFactory.newDefaultInstance().newTransformer();
}
StringWriter writer = new StringWriter();
transformer.transform(new DOMSource(doc), new StreamResult(writer));
return Jsoup.parse(writer.toString());
}
}
You would call this every time the document
property changes. I did some "tests" by browsing Google and printing the org.jsoup.nodes.Document
to the console and everything seems to be working.
There is a caveat, though; as far as I understand it the document
property does not change when there are changes within the same web page (the Document
itself may be updated, however). I'm not a web person, so pardon me if I don't make sense here, but I believe that this includes things like a frame changing its content. There may be a way around this by interfacing with the JavaScript using WebEngine.executeStript(String)
, but I don't know how.
Post a Comment for "How To Parse Html From Javafx Webview And Transfer This Data To Jsoup Document?"