NOTE: The current preferred location for bug reports is the GitHub issue tracker.
Bug 94 - Make control characters and non-Unicode characters be parse errors, for compatibility with XML.
Make control characters and non-Unicode characters be parse errors, for compa...
Status: RESOLVED FIXED
Product: Validator.nu
Classification: Unclassified
Component: HTML parser
HEAD
All All
: P2 normal
Assigned To: Henri Sivonen
http://svn.whatwg.org/webapps/source?...
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-03-03 13:09 CET by Nobody
Modified: 2008-03-14 16:36 CET (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Nobody 2008-03-03 13:09:23 CET
Index: source
===================================================================
--- source	(revision 1262)
+++ source	(revision 1263)
@@ -34677,8 +34677,10 @@
   href="#charset">character encoding declarations</a> are to be
   serialised, as discussed in the section on that topic.</p>
 
-  <p>The U+0000 NULL character must not appear anywhere in a
-  document.</p>
+  <p>The U+0000 NULL character, control characters other than the
+  <span title="space character">space characters</span>, and
+  characters that are not defined by Unicode, must not appear anywhere
+  in a document.</p>
 
   <p class="note">Space characters before the root <code>html</code>
   element will be dropped when the document is parsed; space
@@ -35997,6 +35999,19 @@
   U+FFFD REPLACEMENT CHARACTERs. Any occurrences of such characters is
   a <span>parse error</span>.</p>
 
+  <p>Any occurances of any characters in the ranges U+0001 to U+0008,
+  <!-- space characters allowed --> U+000E to U+001F, <!-- ASCII
+  allowed --> U+007F <!--to U+0084, (U+0085 NEL not allowed),
+  U+0086--> to U+009F, U+D800 to U+DFFF <!-- surrogates not allowed
+  -->, U+FDD0 to U+FDDF, and characters U+FFFE, U+FFFF, U+1FFFE,
+  U+1FFFF, U+2FFFE, U+2FFFF, U+3FFFE, U+3FFFF, U+4FFFE, U+4FFFF,
+  U+5FFFE, U+5FFFF, U+6FFFE, U+6FFFF, U+7FFFE, U+7FFFF, U+8FFFE,
+  U+8FFFF, U+9FFFE, U+9FFFF, U+AFFFE, U+AFFFF, U+BFFFE, U+BFFFF,
+  U+CFFFE, U+CFFFF, U+DFFFE, U+DFFFF, U+EFFFE, U+EFFFF, U+FFFFE,
+  U+FFFFF, U+10FFFE, and U+10FFFF are <span title="parse error">parse
+  errors</span>. (These are all control characters or permanently
+  undefined Unicode characters.)</p>
+
   <p>U+000D CARRIAGE RETURN (CR) characters, and U+000A LINE FEED (LF)
   characters, are treated specially. Any CR characters that are
   followed by LF characters must be removed, and any CR characters not