Bugzilla – Bug 951
XmlSerializer does not support document fragments
Last modified: 2012-12-10 01:03:10 CET
Currently nu.validator.htmlparser.sax.XmlSerializer does not support serializing document fragments. Although it is not documented whether this should be the case, it would be very nice for the users (myself in particular) if we could make use of the logic in XmlSerializer for writing out document fragments without either duplicating the code or hacking around the lack of support. Using htmlvalidator-1.4 and copying nu.validator.htmlparser.tools.HTML2XML with parse changed to parseFragment as follows: ContentHandler serializer = new XmlSerializer(out); HtmlParser parser = new HtmlParser(XmlViolationPolicy.ALTER_INFOSET); parser.setErrorHandler(new SystemErrErrorHandler()); parser.setContentHandler(serializer); parser.setProperty("http://xml.org/sax/properties/lexical-handler", serializer); parser.parseFragment(new InputSource(in), "div"); out.flush(); out.close(); Results in the following run-time error: Exception in thread "main" java.util.NoSuchElementException at java.util.LinkedList.getFirst(LinkedList.java:242) at nu.validator.htmlparser.sax.XmlSerializer.startPrefixMappingPrivate(XmlSerializer.java:728) at nu.validator.htmlparser.sax.XmlSerializer.startElement(XmlSerializer.java:554) at nu.validator.saxtree.TreeParser.startElement(TreeParser.java:185) at nu.validator.saxtree.Element.visit(Element.java:102) at nu.validator.saxtree.TreeParser.parse(TreeParser.java:89) at nu.validator.htmlparser.sax.HtmlParser.parseFragment(HtmlParser.java:451) at HTML2XML.main(HTML2XML.java:80) This appears to be the result of stack being empty because startDocument is not called for document fragments. Unfortunately, since both stack and push (the only method which adds to stack) are private, there's no easy way to subclass XmlSerializer to work around the issue. Thanks for considering.