Bugzilla – Bug 616
Make invalid &#x...; character references not get converted to U+FFFD, for consistency with literal invalid characters.
Last modified: 2009-11-23 17:17:34 CET
Index: source =================================================================== --- source (revision 3373) +++ source (revision 3374) @@ -75699,9 +75699,10 @@ <thead> <tr><th>Number <th colspan=2>Unicode character <tbody> + <tr><td>0x00 <td>U+FFFD <td>REPLACEMENT CHARACTER <tr><td>0x0D <td>U+000A <td>LINE FEED (LF) <tr><td>0x80 <td>U+20AC <td>EURO SIGN ('€') - <tr><td>0x81 <td>U+FFFD <td>REPLACEMENT CHARACTER + <tr><td>0x81 <td>U+0081 <td><control> <tr><td>0x82 <td>U+201A <td>SINGLE LOW-9 QUOTATION MARK ('‚') <tr><td>0x83 <td>U+0192 <td>LATIN SMALL LETTER F WITH HOOK ('ƒ') <tr><td>0x84 <td>U+201E <td>DOUBLE LOW-9 QUOTATION MARK ('„') @@ -75713,10 +75714,10 @@ <tr><td>0x8A <td>U+0160 <td>LATIN CAPITAL LETTER S WITH CARON ('Š') <tr><td>0x8B <td>U+2039 <td>SINGLE LEFT-POINTING ANGLE QUOTATION MARK ('‹') <tr><td>0x8C <td>U+0152 <td>LATIN CAPITAL LIGATURE OE ('Œ') - <tr><td>0x8D <td>U+FFFD <td>REPLACEMENT CHARACTER + <tr><td>0x8D <td>U+008D <td><control> <tr><td>0x8E <td>U+017D <td>LATIN CAPITAL LETTER Z WITH CARON ('Ž') - <tr><td>0x8F <td>U+FFFD <td>REPLACEMENT CHARACTER - <tr><td>0x90 <td>U+FFFD <td>REPLACEMENT CHARACTER + <tr><td>0x8F <td>U+008F <td><control> + <tr><td>0x90 <td>U+0090 <td><control> <tr><td>0x91 <td>U+2018 <td>LEFT SINGLE QUOTATION MARK ('‘') <tr><td>0x92 <td>U+2019 <td>RIGHT SINGLE QUOTATION MARK ('’') <tr><td>0x93 <td>U+201C <td>LEFT DOUBLE QUOTATION MARK ('“') @@ -75729,15 +75730,18 @@ <tr><td>0x9A <td>U+0161 <td>LATIN SMALL LETTER S WITH CARON ('š') <tr><td>0x9B <td>U+203A <td>SINGLE RIGHT-POINTING ANGLE QUOTATION MARK ('›') <tr><td>0x9C <td>U+0153 <td>LATIN SMALL LIGATURE OE ('œ') - <tr><td>0x9D <td>U+FFFD <td>REPLACEMENT CHARACTER + <tr><td>0x9D <td>U+009D <td><control> <tr><td>0x9E <td>U+017E <td>LATIN SMALL LETTER Z WITH CARON ('ž') <tr><td>0x9F <td>U+0178 <td>LATIN CAPITAL LETTER Y WITH DIAERESIS ('Ÿ') </table> + <p>Otherwise, return a character token for the Unicode character + whose code point is that number. + <!-- this is the same as the equivalent list in the input stream - section, except it has 0x0000 included in the first range. --> - <p>Otherwise, if the number is in the range 0x0000 to 0x0008, <!-- - HT, LF allowed --> <!-- U+000B is in the next list --> <!-- FF, CR + section --> + If the number is in the range 0x0001 to 0x0008, <!-- HT, LF + allowed --> <!-- U+000B is in the next list --> <!-- FF, CR allowed --> 0x000E to 0x001F, <!-- ASCII allowed --> 0x007F <!--to 0x0084, (0x0085 NEL not allowed), 0x0086--> to 0x009F, 0xD800 to 0xDFFF<!-- surrogates not allowed -->, 0xFDD0 to 0xFDEF, or is one @@ -75747,11 +75751,7 @@ 0xAFFFE, 0xAFFFF, 0xBFFFE, 0xBFFFF, 0xCFFFE, 0xCFFFF, 0xDFFFE, 0xDFFFF, 0xEFFFE, 0xEFFFF, 0xFFFFE, 0xFFFFF, 0x10FFFE, or 0x10FFFF, or is higher than 0x10FFFF, then this is a <span>parse - error</span>; return a character token for the U+FFFD REPLACEMENT - CHARACTER character instead.</p> - - <p>Otherwise, return a character token for the Unicode character - whose code point is that number.</p> + error</span>.</p> </dd>