Bugzilla – Bug 612
Yet more tinkering of the ASCII-compatible definition. Also, discourage ISO-2022-* due to the potential for XSS.
Last modified: 2009-11-23 17:17:33 CET
Index: source =================================================================== --- source (revision 3334) +++ source (revision 3335) @@ -725,19 +725,21 @@ <h4>Character encodings</h4> - <p>An <dfn>ASCII-compatible character encoding</dfn> is one in which - bytes 0x09, 0x0A, 0x0C, 0x0D, 0x20 - 0x22, 0x26, 0x27, 0x2C - 0x3F, - 0x41 - 0x5A, and 0x61 - 0x7A<!-- is that list ok? do any character - sets we want to support do things outside that range? -->, ignoring - bytes that are the second and later bytes of multibyte sequences, - map to the same Unicode characters as those bytes in ANSI_X3.4-1968 - (US-ASCII). <a href="#refsRFC1345">[RFC1345]</a></p> + <p>An <dfn>ASCII-compatible character encoding</dfn> is a + single-byte or variable-length encoding in which the bytes 0x09, + 0x0A, 0x0C, 0x0D, 0x20 - 0x22, 0x26, 0x27, 0x2C - 0x3F, 0x41 - 0x5A, + and 0x61 - 0x7A<!-- is that list ok? do any character sets we want + to support do things outside that range? -->, ignoring bytes that + are the second and later bytes of multibyte sequences, all + correspond to single-byte sequences that map to the same Unicode + characters as those bytes in ANSI_X3.4-1968 (US-ASCII). <a + href="#refsRFC1345">[RFC1345]</a></p> <p class="note">This includes such exotic encodings as Shift_JIS and variants of ISO-2022, even though it is possible for bytes like 0x70 to be part of longer sequences that are unrelated to their interpretation as ASCII. It excludes such encodings as UTF-7, - UTF-16, HZ-GB-2312, GSM03.38, and EBCDIC variants.</p> + UTF-8+names, UTF-16, HZ-GB-2312, GSM03.38, and EBCDIC variants.</p> <!-- We'll have to change that if anyone comes up with a way to have a @@ -10943,14 +10945,24 @@ <span>ASCII-compatible character encoding</span>.</p> <p>Authors should not use JIS-X-0208 <!-- x-JIS0208 --> - (JIS_C6226-1983), JIS-X-0212 (JIS_X0212-1990), and encodings based - on EBCDIC. Authors should not use UTF-32. Authors must not use the - CESU-8, UTF-7, BOCU-1 and SCSU encodings. <a - href="#refsRFC1345">[RFC1345]</a><!-- for the JIS types --> <a - href="#refsUTF32">[UTF32]</a> <a href="#refsCESU8">[CESU8]</a> <a - href="#refsUTF7">[UTF7]</a> <a href="#refsBOCU1">[BOCU1]</a> <a - href="#refsSCSU">[SCSU]</a></p> <!-- no idea what to reference for - EBCDIC, so... --> + (JIS_C6226-1983), JIS-X-0212 (JIS_X0212-1990), encodings based on + ISO-2022<!-- http://krijnhoetmer.nl/irc-logs/whatwg/20090628#l-422 + -->, and encodings based on EBCDIC. Authors should not use + UTF-32. Authors must not use the CESU-8, UTF-7, BOCU-1 and SCSU + encodings. + <a href="#refsRFC1345">[RFC1345]</a><!-- for the JIS types --> + <a href="#refsRFC1468">[RFC1468]</a><!-- ISO-2022-JP --> + <a href="#refsRFC2237">[RFC2237]</a><!-- ISO-2022-JP-1 --> + <a href="#refsRFC1554">[RFC1554]</a><!-- ISO-2022-JP-2 --> + <a href="#refsRFC1922">[RFC1922]</a><!-- ISO-2022-CN and ISO-2022-CN-EXT --> + <a href="#refsRFC1557">[RFC1557]</a><!-- ISO-2022-KR --> + <a href="#refsUTF32">[UTF32]</a> + <a href="#refsCESU8">[CESU8]</a> + <a href="#refsUTF7">[UTF7]</a> + <a href="#refsBOCU1">[BOCU1]</a> + <a href="#refsSCSU">[SCSU]</a> + <!-- no idea what to reference for EBCDIC, so... --> + </p> <p>Authors are encouraged to use UTF-8. Conformance checkers may advise against authors using legacy encodings.</p> @@ -55677,6 +55689,7 @@ <p class="XXX">...</p> + <div class="impl"> <h4 id="appcache">Application caches</h4>