Bugzilla – Bug 641
U+EDC in URL triggers error "Bad value ໜ for attribute href on element a: COMPATIBILITY_CHARACTER in PATH."
Last modified: 2009-12-01 02:49:51 CET
Try validating the following document: <!doctype html> <title>Test</title> <p><a href=ໜ>Bad URL?</a> where ໜ is replaced the Unicode character U+EDC LAO HO NO. The validator gives the following error: Error: Bad value ໜ for attribute href on element a: COMPATIBILITY_CHARACTER in PATH. From line 3, column 4; to line 3, column 13 title>↩<p><a href=ໜ>Bad UR Syntax of IRI reference: Any URL. For example: /hello, #canvas, or http://example.org/. I've followed the bread crumbs from HTML 5 to Web Addresses to RFC 3987, and as far as I can tell, the entire range %xA0-D7FF is permitted in ucschar, thus iunreserved, thus ipchar, thus isegment*, etc., and so should be valid in IRIs, and thus HTML 5 URLs. If I'm wrong, the error message here should at least be more helpful. This is the second of the two apparent validator bugs I've found that block validation of www.wikipedia.org as HTML 5. See also bug 640.
Compatibility character in IRIs is a SHOULD violation (RFC 3987, section 7.5., third paragraph). Validator.nu configures the IRI checking library to treat SHOULD violations as errors.
But chapter 7 of the RFC is informative even though it says "should"...
I checked in a fix for this and you can check it at http://qa-dev.w3.org:8888/ This case now causes a warning to be reported, instead of an error.