NOTE: The current preferred location for bug reports is the GitHub issue tracker.
Bug 190 - Encoding aliases for CJK environments.
Encoding aliases for CJK environments.
Status: RESOLVED FIXED
Product: Validator.nu
Classification: Unclassified
Component: HTML parser
HEAD
All All
: P2 normal
Assigned To: Henri Sivonen
http://svn.whatwg.org/webapps/source?...
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-05-22 13:48 CEST by Henri Sivonen
Modified: 2008-05-26 12:22 CEST (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Henri Sivonen 2008-05-22 13:48:13 CEST
Index: source
===================================================================
--- source	(revision 1660)
+++ source	(revision 1661)
@@ -39716,9 +39716,40 @@
   encoding instead of as a control character, be considered <span
   title="parse error">parse errors</span>.</p>
 
+  <p>In addition, when a user agent would otherwise use an encoding
+  given in the first column of the following table, it must instead
+  use the encoding given in the cell in the second column of the same
+  row. Any bytes that are treated differently due to this encoding
+  aliasing must be considered <span title="parse error">parse
+  errors</span>.</p>
+
+  <table>
+   <caption>Encoding aliases</caption>
+   <thead>
+    <tr> <th> Input encoding <th> Replacement encoding <th> References
+   <tbody>
+    <tr> <td> GB2312 <td> GBK <td>
+         <a href="#refsGB2312">[GB2312]</a><!-- XXX ? -->
+         <a href="#refsGBK">[GBK]</a><!-- http://www.iana.org/assignments/charset-reg/GBK -->
+    <tr> <td> GB_2312-80 <td> GBK <td>
+         <a href="#refsRFC1345">[RFC1345]</a>
+         <a href="#refsGBK">[GBK]</a><!-- http://www.iana.org/assignments/charset-reg/GBK -->
+    <tr> <td> EUC-KR <td> Windows-949 <td>
+         <a href="#refsEUCKR">[EUCKR]</a> <!-- see reference for [EUC-KR] in RFC1557 -->
+         <a href="#refsWin949">[WIN949]</a><!-- http://www.microsoft.com/globaldev/reference/dbcs/949.mspx -->
+    <tr> <td> KS_C_5601-1987 <td> Windows-949 <td>
+         <a href="#refsRFC1345">[RFC1345]</a>
+         <a href="#refsWin949">[WIN949]</a><!-- http://www.microsoft.com/globaldev/reference/dbcs/949.mspx -->
+    <tr> <td> x-x-big5 <td> Big5 <td>
+         <a href="#BIG5">[BIG5]</a> <!-- XXX ? -->
+   </tbody>
+  </table>
+
   <p class="note">The requirement to treat certain ISO-8859 encodings
-  as Windows encodings is a willful violation of the W3C Character
-  Model specification. <a href="#refsCHARMOD">[CHARMOD]</a></p>
+  as Windows encodings, and the requirement to alias certain encodings
+  according to the table above, are willful violations of the W3C
+  Character Model specification. <a
+  href="#refsCHARMOD">[CHARMOD]</a></p>
 
   <p>User agents must not support the CESU-8, UTF-7, BOCU-1 and SCSU
   encodings. <a href="#refsCESU8">[CESU8]</a> <a