NOTE: The current preferred location for bug reports is the GitHub issue tracker.
Bug 951 - XmlSerializer does not support document fragments
XmlSerializer does not support document fragments
Status: NEW
Product: Validator.nu
Classification: Unclassified
Component: HTML parser
HEAD
All All
: P2 normal
Assigned To: Nobody
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-12-10 01:03 CET by Kevin Locke
Modified: 2012-12-10 01:03 CET (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Kevin Locke 2012-12-10 01:03:10 CET
Currently nu.validator.htmlparser.sax.XmlSerializer does not support serializing document fragments.  Although it is not documented whether this should be the case, it would be very nice for the users (myself in particular) if we could make use of the logic in XmlSerializer for writing out document fragments without either duplicating the code or hacking around the lack of support.

Using htmlvalidator-1.4 and copying nu.validator.htmlparser.tools.HTML2XML with parse changed to parseFragment as follows:

ContentHandler serializer = new XmlSerializer(out);

HtmlParser parser = new HtmlParser(XmlViolationPolicy.ALTER_INFOSET);

parser.setErrorHandler(new SystemErrErrorHandler());
parser.setContentHandler(serializer);
parser.setProperty("http://xml.org/sax/properties/lexical-handler",
        serializer);
parser.parseFragment(new InputSource(in), "div"); 
out.flush();
out.close();

Results in the following run-time error:

Exception in thread "main" java.util.NoSuchElementException
	at java.util.LinkedList.getFirst(LinkedList.java:242)
	at nu.validator.htmlparser.sax.XmlSerializer.startPrefixMappingPrivate(XmlSerializer.java:728)
	at nu.validator.htmlparser.sax.XmlSerializer.startElement(XmlSerializer.java:554)
	at nu.validator.saxtree.TreeParser.startElement(TreeParser.java:185)
	at nu.validator.saxtree.Element.visit(Element.java:102)
	at nu.validator.saxtree.TreeParser.parse(TreeParser.java:89)
	at nu.validator.htmlparser.sax.HtmlParser.parseFragment(HtmlParser.java:451)
	at HTML2XML.main(HTML2XML.java:80)

This appears to be the result of stack being empty because startDocument is not called for document fragments.  Unfortunately, since both stack and push (the only method which adds to stack) are private, there's no easy way to subclass XmlSerializer to work around the issue.

Thanks for considering.