Bug 612 – Yet more tinkering of the ASCII-compatible definition. Also, discourage ISO-2022-* due to the potential for XSS.

NOTE: The current preferred location for bug reports is the GitHub issue tracker.

Bug 612 - Yet more tinkering of the ASCII-compatible definition. Also, discourage ISO-2022-* due to the potential for XSS.


Summary:	Yet more tinkering of the ASCII-compatible definition. Also, discourage ISO-2...

Status:	NEW

Product:	Validator.nu
Classification:	Unclassified
Component:	General
Version:	HEAD
Hardware:	All All

Importance:	P2 normal
Assigned To:	Nobody

URL:	http://svn.whatwg.org/webapps/source?...

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2009-07-14 15:03 CEST by Henri Sivonen
Modified:	2009-11-23 17:17 CET (History)
CC List:	0 users

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Henri Sivonen 2009-07-14 15:03:20 CEST

Index: source
===================================================================
--- source	(revision 3334)
+++ source	(revision 3335)
@@ -725,19 +725,21 @@
 
   <h4>Character encodings</h4>
 
-  <p>An <dfn>ASCII-compatible character encoding</dfn> is one in which
-  bytes 0x09, 0x0A, 0x0C, 0x0D, 0x20 - 0x22, 0x26, 0x27, 0x2C - 0x3F,
-  0x41 - 0x5A, and 0x61 - 0x7A<!-- is that list ok? do any character
-  sets we want to support do things outside that range?  -->, ignoring
-  bytes that are the second and later bytes of multibyte sequences,
-  map to the same Unicode characters as those bytes in ANSI_X3.4-1968
-  (US-ASCII). <a href="#refsRFC1345">[RFC1345]</a></p>
+  <p>An <dfn>ASCII-compatible character encoding</dfn> is a
+  single-byte or variable-length encoding in which the bytes 0x09,
+  0x0A, 0x0C, 0x0D, 0x20 - 0x22, 0x26, 0x27, 0x2C - 0x3F, 0x41 - 0x5A,
+  and 0x61 - 0x7A<!-- is that list ok? do any character sets we want
+  to support do things outside that range?  -->, ignoring bytes that
+  are the second and later bytes of multibyte sequences, all
+  correspond to single-byte sequences that map to the same Unicode
+  characters as those bytes in ANSI_X3.4-1968 (US-ASCII). <a
+  href="#refsRFC1345">[RFC1345]</a></p>
 
   <p class="note">This includes such exotic encodings as Shift_JIS and
   variants of ISO-2022, even though it is possible for bytes like 0x70
   to be part of longer sequences that are unrelated to their
   interpretation as ASCII. It excludes such encodings as UTF-7,
-  UTF-16, HZ-GB-2312, GSM03.38, and EBCDIC variants.</p>
+  UTF-8+names, UTF-16, HZ-GB-2312, GSM03.38, and EBCDIC variants.</p>
 
   <!--
    We'll have to change that if anyone comes up with a way to have a
@@ -10943,14 +10945,24 @@
   <span>ASCII-compatible character encoding</span>.</p>
 
   <p>Authors should not use JIS-X-0208 <!-- x-JIS0208 -->
-  (JIS_C6226-1983), JIS-X-0212 (JIS_X0212-1990), and encodings based
-  on EBCDIC. Authors should not use UTF-32. Authors must not use the
-  CESU-8, UTF-7, BOCU-1 and SCSU encodings. <a
-  href="#refsRFC1345">[RFC1345]</a><!-- for the JIS types --> <a
-  href="#refsUTF32">[UTF32]</a> <a href="#refsCESU8">[CESU8]</a> <a
-  href="#refsUTF7">[UTF7]</a> <a href="#refsBOCU1">[BOCU1]</a> <a
-  href="#refsSCSU">[SCSU]</a></p> <!-- no idea what to reference for
-  EBCDIC, so... -->
+  (JIS_C6226-1983), JIS-X-0212 (JIS_X0212-1990), encodings based on
+  ISO-2022<!-- http://krijnhoetmer.nl/irc-logs/whatwg/20090628#l-422
+  -->, and encodings based on EBCDIC. Authors should not use
+  UTF-32. Authors must not use the CESU-8, UTF-7, BOCU-1 and SCSU
+  encodings.
+  <a href="#refsRFC1345">[RFC1345]</a><!-- for the JIS types -->
+  <a href="#refsRFC1468">[RFC1468]</a><!-- ISO-2022-JP -->
+  <a href="#refsRFC2237">[RFC2237]</a><!-- ISO-2022-JP-1 -->
+  <a href="#refsRFC1554">[RFC1554]</a><!-- ISO-2022-JP-2 -->
+  <a href="#refsRFC1922">[RFC1922]</a><!-- ISO-2022-CN and ISO-2022-CN-EXT -->
+  <a href="#refsRFC1557">[RFC1557]</a><!-- ISO-2022-KR -->
+  <a href="#refsUTF32">[UTF32]</a>
+  <a href="#refsCESU8">[CESU8]</a>
+  <a href="#refsUTF7">[UTF7]</a>
+  <a href="#refsBOCU1">[BOCU1]</a>
+  <a href="#refsSCSU">[SCSU]</a>
+  <!-- no idea what to reference for EBCDIC, so... -->
+  </p>
 
   <p>Authors are encouraged to use UTF-8. Conformance checkers may
   advise against authors using legacy encodings.</p>
@@ -55677,6 +55689,7 @@
   <p class="XXX">...</p>
 
 
+
   <div class="impl">
 
   <h4 id="appcache">Application caches</h4>