Bugzilla – Bug 1020
Parser crashes on NPE when null entity resolver returns null
Last modified: 2015-02-18 04:34:21 CET
Created attachment 235 [details] Complete source of changed class JAXP allows returning null from resolveEntity() when there is no need to load resource from the different source, see: http://docs.oracle.com/javase/7/docs/api/org/xml/sax/EntityResolver.html#resolveEntity(java.lang.String,%20java.lang.String) method tokenize() in HtmlParser.java doesn't handle this situation and always use InputStream returned by resolveEntity() even if it's null (which leads to NPE few lines later). Fix is to replace: if (entityResolver != null) { is = entityResolver.resolveEntity(is.getPublicId(), systemId); with something like: if (entityResolver != null) { InputSource nis = entityResolver.resolveEntity(is.getPublicId(), systemId); if (nis != null) { is = nis; } }
Thanks for catching this and patching it. The patch looks right to me but as you may know, since the parser code is an independent library apart from the validator, the sources are managed in a separate (mercurial) repo, so this change needs to be reviewed and landed by Henri. I think the process Henri prefers for handling patches to the parser code depends a bit on whether it's code that affects the behavior of the translated C++ gecko parser. So think you also need to wait for him to weigh in on how he'd like it submitted. But I would imagine that regardless you probably want to prepare it as a patch against the sources in the mercurial repo at http://hg.mozilla.org/projects/htmlparser/ Specifically, if you clone that repo and then commit this as a change against the head in your clone, you can than use "hg export" to dump the changeset out with all its metadata in a way that Henri could then import into the upstream repo. Anyway for now as far as the changed version you attached at https://bugzilla.validator.nu/attachment.cgi?id=235 here are a couple of style nits: - trailing space on the `if (nis != null)` line - indentation in your change uses a mix of tabs and spaces; please just use spaces