NOTE: The current preferred location for bug reports is the GitHub issue tracker.
Bug 857 - validator mistakenly reports unmatched surrogate when parsing XHTML file
validator mistakenly reports unmatched surrogate when parsing XHTML file
Status: NEW
Product: Validator.nu
Classification: Unclassified
Component: XML parser
HEAD
All All
: P2 normal
Assigned To: Nobody
http://www.madore.org/~david/.tmp/val...
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-09-07 10:27 CEST by David A. Madore
Modified: 2011-09-07 10:27 CEST (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description David A. Madore 2011-09-07 10:27:10 CEST
The page http://www.madore.org/~david/.tmp/validator-bug.xhtml is, as far as I know, valid XHTML5 (and also polyglot as HTML5), and it is undoubtedly correctly encoded UTF-8 (*not* CESU-8 or anything).  When trying to validate it, Validator.nu mistakenly reports:

Fatal Error: Unmatched low surrogate.
At line 8, column 17

The content of the problematic alt attribute consistes of the two characters: U+1F307 SUNSET OVER BUILDINGS and U+1F365 FISH CAKE WITH SWIRL DESIGN, though the specifics are probably irrelevant beyond the fact that they are not in the BMP (and perhaps, that they were introduced in Unicode 6.0).  Putting two characters is necessary to trigger the bug.