Bugzilla – Bug 857
validator mistakenly reports unmatched surrogate when parsing XHTML file
Last modified: 2011-09-07 10:27:10 CEST
The page http://www.madore.org/~david/.tmp/validator-bug.xhtml is, as far as I know, valid XHTML5 (and also polyglot as HTML5), and it is undoubtedly correctly encoded UTF-8 (*not* CESU-8 or anything). When trying to validate it, Validator.nu mistakenly reports: Fatal Error: Unmatched low surrogate. At line 8, column 17 The content of the problematic alt attribute consistes of the two characters: U+1F307 SUNSET OVER BUILDINGS and U+1F365 FISH CAKE WITH SWIRL DESIGN, though the specifics are probably irrelevant beyond the fact that they are not in the BMP (and perhaps, that they were introduced in Unicode 6.0). Putting two characters is necessary to trigger the bug.