Bugzilla – Bug 848
make parser conform to current "Changing the encoding while parsing" algorithm
Last modified: 2011-07-13 10:03:05 CEST
The parser code does not conform to the current requirements in the "Changing the encoding while parsing" algorithm. It should be updated to match the current requirements there. Details: When conforming parsers encounter a a meta element that specifies a character encoding, the HTML5 spec requires them to run the "Changing the encoding while parsing" algorithm: http://dev.w3.org/html5/spec/parsing.html#changing-the-encoding-while-parsing In the case of the v.nu parser, what that means is, the code calls the internalEncodingDeclaration(String internalCharset) method, where internalCharset is the value of the character encoding specified by the meta element. In the current code, the first condition checked for in internalEncodingDeclaration(String internalCharset) is whether internalCharset specifies a UTF-16 encoding. If it does, the code changes the document character encoding to UTF-8 and logs a "Internal encoding declaration specified 'UTF-16' which is not an ASCII superset. Continuing as if the encoding had been 'UTF-8'." parse error. That behavior conforms to the requirements in the "Changing the encoding while parsing" an earlier version of the spec: http://www.w3.org/TR/2009/WD-html5-20090423/syntax.html#changing-the-encoding-while-parsing ...in which step 1 read: "If the new encoding is a UTF-16 encoding, change it to UTF-8." ...but the requirements in the spec have since changed, and that step is no longer step 1, but instead step 3.