Bugzilla – Bug 95
Make using a Win1252-specific byte when the document declared as ISO-8859-1 be a parse error.
Last modified: 2011-10-15 13:38:15 CEST
--- source (revision 1263)
+++ source (revision 1264)
@@ -35965,10 +35965,13 @@
<p>When a user agent would otherwise use the ISO-8859-1 encoding, it
- must instead use the Windows-1252 encoding.</p>
+ must instead use the Windows-1252 encoding, except that any bytes in
+ the range 0x80 to 0x9F must, in addition to being interpreted as per
+ the Windows-1252 encoding, be considered <span title="parse
+ error">parse errors</span>.</p>
- <p class="note">This requirement is a willful violation of the W3C
- Character Model specification. <a
+ <p class="note">The requirement to treat ISO-8859-1 as Windows-1252
+ is a willful violation of the W3C Character Model specification. <a
<p>User agents must not support the CESU-8, UTF-7, BOCU-1 and SCSU
The current wording at
does not seem to contain any requirement on treating some data as parse errors. It simply states that iso-8859-1 is to be treated as windows-1252.
However, I think it would be most useful to issue a warning about an octet that does not represent an allowed character by iso-8859-1, even if it is defined in window-1252.
Although browsers almost uniformly treat iso-8859-1 as windows-1252, it's still a protocol error to declare a document as iso-8859-1 encoded when it is not. Moreover, such a situation often reflects an accidental error (e.g., the author entered a character he didn't mean to enter), and if the author in fact meant to enter non-iso-8859-1 windows-1252 characters, he can and should fix this by changing the character encoding information.