NOTE: The current preferred location for bug reports is the GitHub issue tracker.
Bug 1020 - Parser crashes on NPE when null entity resolver returns null
Parser crashes on NPE when null entity resolver returns null
Status: NEW
Product: Validator.nu
Classification: Unclassified
Component: HTML parser
HEAD
All All
: P2 normal
Assigned To: Henri Sivonen
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2015-02-17 09:33 CET by Jirka Kosek
Modified: 2015-02-18 04:34 CET (History)
1 user (show)

See Also:


Attachments
Complete source of changed class (42.85 KB, application/octet-stream)
2015-02-17 09:33 CET, Jirka Kosek
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jirka Kosek 2015-02-17 09:33:00 CET
Created attachment 235 [details]
Complete source of changed class

JAXP allows returning null from resolveEntity() when there is no need to load resource from the different source, see:

http://docs.oracle.com/javase/7/docs/api/org/xml/sax/EntityResolver.html#resolveEntity(java.lang.String,%20java.lang.String)

method tokenize() in HtmlParser.java doesn't handle this situation and always use InputStream returned by resolveEntity() even if it's null (which leads to NPE few lines later).

Fix is to replace:

            if (entityResolver != null) {
		is = entityResolver.resolveEntity(is.getPublicId(), systemId);

with something like:

            if (entityResolver != null) {
		InputSource nis = entityResolver.resolveEntity(is.getPublicId(), systemId);
                if (nis != null) 
		{
		    is = nis;
		}
            }
Comment 1 Michael[tm] Smith 2015-02-18 04:34:21 CET
Thanks for catching this and patching it.

The patch looks right to me but as you may know, since the parser code is an independent library apart from the validator, the sources are managed in a separate (mercurial) repo, so this change needs to be reviewed and landed by Henri.

I think the process Henri prefers for handling patches to the parser code depends a bit on whether it's code that affects the behavior of the translated C++ gecko parser. So think you also need to wait for him to weigh in on how he'd like it submitted.

But I would imagine that regardless you probably want to prepare it as a patch against the sources in the mercurial repo at http://hg.mozilla.org/projects/htmlparser/

Specifically, if you clone that repo and then commit this as a change against the head in your clone, you can than use "hg export" to dump the changeset out with all its metadata in a way that Henri could then import into the upstream repo.

Anyway for now as far as the changed version you attached at https://bugzilla.validator.nu/attachment.cgi?id=235 here are a couple of style nits:

  - trailing space on the `if (nis != null)` line
  - indentation in your change uses a mix of tabs and spaces; please just use spaces