Bug 444 – Catch CoderMalfunctionError in Encoding.asciiMapsToBasicLatin()

NOTE: The current preferred location for bug reports is the GitHub issue tracker.

Bug 444 - Catch CoderMalfunctionError in Encoding.asciiMapsToBasicLatin()


Summary:	Catch CoderMalfunctionError in Encoding.asciiMapsToBasicLatin()

Status:	RESOLVED FIXED

Product:	Validator.nu
Classification:	Unclassified
Component:	HTML parser
Version:	HEAD
Hardware:	All All

Importance:	P2 normal
Assigned To:	Henri Sivonen

URL:

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2009-02-05 04:31 CET by Carey Evans
Modified:	2009-02-09 13:38 CET (History)
CC List:	0 users

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Carey Evans 2009-02-05 04:31:14 CET

On the IBM Java 5 JRE, the ISCII CharsetDecoder throws CoderMalfunctionError in Encoding.asciiMapsToLatin1(). Because this is an Error, not an Exception, it is not caught by the "catch (Exception e)" line.

Although this is probably IBM's bug, it would be good if the HTML parser worked anyway.

The stack trace is:

java.nio.charset.CoderMalfunctionError: java.nio.BufferOverflowException
	at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:489)
	at sun.nio.cs.StreamDecoder$CharsetSD.implRead(StreamDecoder.java:511)
	at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:252)
	at sun.nio.cs.StreamDecoder.read0(StreamDecoder.java:201)
	at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:187)
	at java.io.InputStreamReader.read(InputStreamReader.java:196)
	at nu.validator.htmlparser.io.Encoding.asciiMapsToBasicLatin(Encoding.java:211)
	at nu.validator.htmlparser.io.Encoding.<clinit>(Encoding.java:110)
	at java.lang.J9VMInternals.initializeImpl(Native Method)
	at java.lang.J9VMInternals.initialize(J9VMInternals.java:194)
	at nu.validator.htmlparser.io.HtmlInputStreamReader.<init>(HtmlInputStreamReader.java:135)
	at nu.validator.htmlparser.io.Driver.tokenize(Driver.java:199)
	at nu.validator.htmlparser.dom.HtmlDocumentBuilder.tokenize(HtmlDocumentBuilder.java:405)
	at nu.validator.htmlparser.dom.HtmlDocumentBuilder.parse(HtmlDocumentBuilder.java:204)
	at zcarey.html.HTML5Document.main(HTML5Document.java:22)
Caused by: java.nio.BufferOverflowException
	at java.nio.Buffer.nextPutIndex(Buffer.java:434)
	at java.nio.HeapCharBuffer.put(HeapCharBuffer.java:160)
	at com.ibm.nio.cs.ISCII91$Decoder.decodeLoop(ISCII91.java:211)
	at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:485)
	... 14 more

I can reproduce this bug (and a bug with ISCII in Sun Java 5 as well) with this simple code:

new InputStreamReader(new ByteArrayInputStream(new byte[3]),
    Charset.forName("ISCII").newDecoder()).read();

Comment 1 Henri Sivonen 2009-02-05 11:37:56 CET

I don't have access to an IBM JRE at the moment, but I have added a catch block for CoderMalfunctionError on svn trunk. Should I also add ISCII to the list of banned encodings? If so, what's the preferred name that the IBM JRE returns for ISCII?

Comment 2 Carey Evans 2009-02-06 11:04:34 CET

I don't think there's anything wrong with ISCII itself, only IBM's implementation of the decoder in their Java 5. After looking into it further, it looks like IBM switched to Sun's version for Java 6, and it works properly there.

IBM Java 5 calls the encoding "ISCII91", while Sun Java 5, IBM Java 6 and Sun Java 6 call it "x-ISCII91" with "ISCII91" as an alias, so it seems reasonable to ban the broken version.

You can download the IBM JDK for free, with added Eclipse, from http://www.ibm.com/developerworks/java/jdk/eclipse/index.html. Version 210/211 comes with IBM Java 5, while version 300 comes with IBM Java 6.

Comment 3 Henri Sivonen 2009-02-09 13:38:48 CET

Added iscii91 to the banned list in svn. Marking fixed. Thanks.