NOTE: The current preferred location for bug reports is the GitHub issue tracker.
Bug 216 - Shun UTF-32. Make it slightly clearer what 'UTF-16' means.
Shun UTF-32. Make it slightly clearer what 'UTF-16' means.
Status: RESOLVED FIXED
Product: Validator.nu
Classification: Unclassified
Component: HTML parser
HEAD
All All
: P2 normal
Assigned To: Henri Sivonen
http://svn.whatwg.org/webapps/source?...
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-05-25 21:07 CEST by Henri Sivonen
Modified: 2008-05-28 14:47 CEST (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Henri Sivonen 2008-05-25 21:07:26 CEST
Index: source
===================================================================
--- source	(revision 1700)
+++ source	(revision 1701)
@@ -31031,13 +31031,15 @@
      <tbody>
       <tr>
        <td>FE FF
-       <td>UTF-16BE BOM <!-- followed by a character --> or UTF-32LE BOM
+       <td>UTF-16BE BOM <!-- followed by a character --><!-- nobody uses this: or UTF-32LE BOM -->
       <tr>
        <td>FF FE
        <td>UTF-16LE BOM <!-- followed by a character -->
+<!-- nobody uses this
       <tr>
        <td>00 00 FE FF
        <td>UTF-32BE BOM
+-->
 <!-- this one is redundant with the one above
       <tr>
        <td>FF FE 00 00
@@ -31055,8 +31057,6 @@
 
     <p>...then the sniffed type of the resource is "text/plain".</p>
 
-    <p class="big-issue">Should we remove UTF-32 from the above?</p>
-
    </li>
 
    <li><p>Otherwise, if any of the first <var title="">n</var> bytes
@@ -39803,6 +39803,11 @@
   <p>Support for UTF-32 is not recommended. This encoding is rarely
   used, and frequently misimplemented.</p>
 
+  <p class="note">This specification does not make any attempt to
+  support UTF-32 in its algorithms; support and use of UTF-32 can thus
+  lead to unexpected behavior in implementations of this
+  specification.</p>
+
 
 
   <h5>Preprocessing the input stream</h5>
@@ -39886,7 +39891,8 @@
 
   <ol>
 
-   <li>If the new encoding is UTF-16, change it to UTF-8.</li>
+   <li>If the new encoding is a UTF-16 encoding, change it to
+   UTF-8.</li>
 
    <li>If the new encoding is identical or equivalent to the encoding
    that is already being used to interpret the input stream, then set