Bugzilla – Bug 95
Make using a Win1252-specific byte when the document declared as ISO-8859-1 be a parse error.
Last modified: 2011-10-15 13:38:15 CEST
Index: source =================================================================== --- source (revision 1263) +++ source (revision 1264) @@ -35965,10 +35965,13 @@ href="#refsIANACHARSET">[IANACHARSET]</a></p> <p>When a user agent would otherwise use the ISO-8859-1 encoding, it - must instead use the Windows-1252 encoding.</p> + must instead use the Windows-1252 encoding, except that any bytes in + the range 0x80 to 0x9F must, in addition to being interpreted as per + the Windows-1252 encoding, be considered <span title="parse + error">parse errors</span>.</p> - <p class="note">This requirement is a willful violation of the W3C - Character Model specification. <a + <p class="note">The requirement to treat ISO-8859-1 as Windows-1252 + is a willful violation of the W3C Character Model specification. <a href="#refsCHARMOD">[CHARMOD]</a></p> <p>User agents must not support the CESU-8, UTF-7, BOCU-1 and SCSU
Potential WONTFIX.
The current wording at http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#character-encodings-0 does not seem to contain any requirement on treating some data as parse errors. It simply states that iso-8859-1 is to be treated as windows-1252. However, I think it would be most useful to issue a warning about an octet that does not represent an allowed character by iso-8859-1, even if it is defined in window-1252. Although browsers almost uniformly treat iso-8859-1 as windows-1252, it's still a protocol error to declare a document as iso-8859-1 encoded when it is not. Moreover, such a situation often reflects an accidental error (e.g., the author entered a character he didn't mean to enter), and if the author in fact meant to enter non-iso-8859-1 windows-1252 characters, he can and should fix this by changing the character encoding information.