Bug 216 – Shun UTF-32. Make it slightly clearer what 'UTF-16' means.

NOTE: The current preferred location for bug reports is the GitHub issue tracker.

Bug 216 - Shun UTF-32. Make it slightly clearer what 'UTF-16' means.


Summary:	Shun UTF-32. Make it slightly clearer what 'UTF-16' means.

Status:	RESOLVED FIXED

Product:	Validator.nu
Classification:	Unclassified
Component:	HTML parser
Version:	HEAD
Hardware:	All All

Importance:	P2 normal
Assigned To:	Henri Sivonen

URL:	http://svn.whatwg.org/webapps/source?...

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2008-05-25 21:07 CEST by Henri Sivonen
Modified:	2008-05-28 14:47 CEST (History)
CC List:	0 users

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Henri Sivonen 2008-05-25 21:07:26 CEST

Index: source
===================================================================
--- source	(revision 1700)
+++ source	(revision 1701)
@@ -31031,13 +31031,15 @@
      <tbody>
       <tr>
        <td>FE FF
-       <td>UTF-16BE BOM <!-- followed by a character --> or UTF-32LE BOM
+       <td>UTF-16BE BOM <!-- followed by a character --><!-- nobody uses this: or UTF-32LE BOM -->
       <tr>
        <td>FF FE
        <td>UTF-16LE BOM <!-- followed by a character -->
+<!-- nobody uses this
       <tr>
        <td>00 00 FE FF
        <td>UTF-32BE BOM
+-->
 <!-- this one is redundant with the one above
       <tr>
        <td>FF FE 00 00
@@ -31055,8 +31057,6 @@
 
     <p>...then the sniffed type of the resource is "text/plain".</p>
 
-    <p class="big-issue">Should we remove UTF-32 from the above?</p>
-
    </li>
 
    <li><p>Otherwise, if any of the first <var title="">n</var> bytes
@@ -39803,6 +39803,11 @@
   <p>Support for UTF-32 is not recommended. This encoding is rarely
   used, and frequently misimplemented.</p>
 
+  <p class="note">This specification does not make any attempt to
+  support UTF-32 in its algorithms; support and use of UTF-32 can thus
+  lead to unexpected behavior in implementations of this
+  specification.</p>
+
 
 
   <h5>Preprocessing the input stream</h5>
@@ -39886,7 +39891,8 @@
 
   <ol>
 
-   <li>If the new encoding is UTF-16, change it to UTF-8.</li>
+   <li>If the new encoding is a UTF-16 encoding, change it to
+   UTF-8.</li>
 
    <li>If the new encoding is identical or equivalent to the encoding
    that is already being used to interpret the input stream, then set