Bugzilla – Bug 94
Make control characters and non-Unicode characters be parse errors, for compatibility with XML.
Last modified: 2008-03-14 16:36:06 CET
Index: source =================================================================== --- source (revision 1262) +++ source (revision 1263) @@ -34677,8 +34677,10 @@ href="#charset">character encoding declarations</a> are to be serialised, as discussed in the section on that topic.</p> - <p>The U+0000 NULL character must not appear anywhere in a - document.</p> + <p>The U+0000 NULL character, control characters other than the + <span title="space character">space characters</span>, and + characters that are not defined by Unicode, must not appear anywhere + in a document.</p> <p class="note">Space characters before the root <code>html</code> element will be dropped when the document is parsed; space @@ -35997,6 +35999,19 @@ U+FFFD REPLACEMENT CHARACTERs. Any occurrences of such characters is a <span>parse error</span>.</p> + <p>Any occurances of any characters in the ranges U+0001 to U+0008, + <!-- space characters allowed --> U+000E to U+001F, <!-- ASCII + allowed --> U+007F <!--to U+0084, (U+0085 NEL not allowed), + U+0086--> to U+009F, U+D800 to U+DFFF <!-- surrogates not allowed + -->, U+FDD0 to U+FDDF, and characters U+FFFE, U+FFFF, U+1FFFE, + U+1FFFF, U+2FFFE, U+2FFFF, U+3FFFE, U+3FFFF, U+4FFFE, U+4FFFF, + U+5FFFE, U+5FFFF, U+6FFFE, U+6FFFF, U+7FFFE, U+7FFFF, U+8FFFE, + U+8FFFF, U+9FFFE, U+9FFFF, U+AFFFE, U+AFFFF, U+BFFFE, U+BFFFF, + U+CFFFE, U+CFFFF, U+DFFFE, U+DFFFF, U+EFFFE, U+EFFFF, U+FFFFE, + U+FFFFF, U+10FFFE, and U+10FFFF are <span title="parse error">parse + errors</span>. (These are all control characters or permanently + undefined Unicode characters.)</p> + <p>U+000D CARRIAGE RETURN (CR) characters, and U+000A LINE FEED (LF) characters, are treated specially. Any CR characters that are followed by LF characters must be removed, and any CR characters not