NOTE: The current preferred location for bug reports is the GitHub issue tracker.
Bug 196 - Make entities not be allowed to use non-unicode characters
Make entities not be allowed to use non-unicode characters
Status: RESOLVED FIXED
Product: Validator.nu
Classification: Unclassified
Component: HTML parser
HEAD
All All
: P2 normal
Assigned To: Henri Sivonen
http://svn.whatwg.org/webapps/source?...
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-05-23 13:22 CEST by Henri Sivonen
Modified: 2008-06-18 15:26 CEST (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Henri Sivonen 2008-05-23 13:22:48 CEST
Index: source
===================================================================
--- source	(revision 1667)
+++ source	(revision 1668)
@@ -41870,9 +41870,18 @@
       <tr><td>0x9F <td>U+0178 <td>LATIN CAPITAL LETTER Y WITH DIAERESIS ('&#x0178;')
     </table>
 
-    <p>Otherwise, if the number is zero, if the number is higher than
-    0x10FFFF, or if it's one of the surrogate characters (characters
-    in the range 0xD800 to 0xDFFF), then this is a <span>parse
+    <!-- this is the same as the equivalent list in the input stream
+    section, except it has 0x0000 included in the first range. -->
+    <p>Otherwise, if the number is in the range 0x0000 to 0x0008, <!--
+    space characters allowed --> 0x000E to 0x001F, <!-- ASCII allowed
+    --> 0x007F <!--to 0x0084, (0x0085 NEL not allowed), 0x0086--> to
+    0x009F, 0xD800 to 0xDFFF <!-- surrogates not allowed -->, 0xFDD0
+    to 0xFDDF, or is one of 0xFFFE, 0xFFFF, 0x1FFFE, 0x1FFFF, 0x2FFFE,
+    0x2FFFF, 0x3FFFE, 0x3FFFF, 0x4FFFE, 0x4FFFF, 0x5FFFE, 0x5FFFF,
+    0x6FFFE, 0x6FFFF, 0x7FFFE, 0x7FFFF, 0x8FFFE, 0x8FFFF, 0x9FFFE,
+    0x9FFFF, 0xAFFFE, 0xAFFFF, 0xBFFFE, 0xBFFFF, 0xCFFFE, 0xCFFFF,
+    0xDFFFE, 0xDFFFF, 0xEFFFE, 0xEFFFF, 0xFFFFE, 0xFFFFF, 0x10FFFE, or
+    0x10FFFF, or is higher than 0x10FFFF, then this is a <span>parse
     error</span>; return a character token for the U+FFFD REPLACEMENT
     CHARACTER character instead.</p>