Bugzilla – Bug 935
Nullreference bug in nu.validator.htmlparser.xom.HtmlBuilder ctor
Last modified: 2012-09-21 15:21:39 CEST
public HtmlBuilder(SimpleNodeFactory nodeFactory, XmlViolationPolicy xmlPolicy) { super(); this.simpleNodeFactory = nodeFactory; this.treeBuilder = new XOMTreeBuilder(nodeFactory); this.driver = null; //Nullreference this.driver.setXmlnsPolicy(XmlViolationPolicy.ALTER_INFOSET); setXmlPolicy(xmlPolicy); }
Here's my workaround until this gets fixed. I use the nu.validator SAX parser to feed XOM's Builder. public static Document parseHtmlFile(File file) { nu.validator.htmlparser.sax.HtmlParser parser = new nu.validator.htmlparser.sax.HtmlParser(){ @Override public void setFeature(String name, boolean value) throws SAXNotRecognizedException, SAXNotSupportedException { try { super.setFeature(name, value); } catch(Exception e) { System.out.println("Could not set "+name); } } }; Builder builder = new Builder(parser, false); try { Document doc = builder.build(new InputStreamReader(new FileInputStream(file), "UTF-8")); return doc; } catch(Exception e) { throw new RuntimeException(e); } }
Whoa. How did *that* happen? Fixed: https://hg.mozilla.org/projects/htmlparser/rev/1d03b97674c8 I’ll make a new release after some more fixes. Hopefully next week.