NOTE: The current preferred location for bug reports is the GitHub issue tracker.
Bug 867 - confusing (incorrect?) error message for page served with «Content-Type: text/html; charset="UTF-8"» HTTP header
confusing (incorrect?) error message for page served with «Content-Type: text...
Status: NEW
Product: Validator.nu
Classification: Unclassified
Component: HTML parser
HEAD
All All
: P2 normal
Assigned To: Nobody
https://dvcs.w3.org/hg/webperf/raw-fi...
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-10-21 05:53 CEST by Michael[tm] Smith
Modified: 2011-10-26 08:55 CEST (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Michael[tm] Smith 2011-10-21 05:53:44 CEST
Running a page served with a «Content-Type: text/html; charset="UTF-8"» HTTP header through the validator causes the following error message to be omitted:

[[
The encoding "utf-8" is not the preferred name of the character encoding in use. The preferred name is utf-8. (Charmod C024)
]]

The cause is apparently that the htmlparser code is doing a case-insensitive comparison of the preferred value «utf-8» against the value of the charset parameter in the Content-Type header-- but instead of treating that charset value as a quoted string in the comparison, it's treating it as  single literal token «"UTF-8"» during that comparison -- with the quotes as part of the token -- which, due to the quotes, doesn't case-insensitively match «utf-8». So the comparison fails.
Comment 1 Henri Sivonen 2011-10-25 17:07:30 CEST
Having quotes in the parameter is bogus on the HTTP layer. Thus, the value extracted from the HTTP header includes the quotes.

Any ideas how to give a more obvious message about this?
Comment 2 Simon Pieters 2011-10-25 17:34:39 CEST
Why is it bogus?
Comment 3 Henri Sivonen 2011-10-26 08:55:13 CEST
(In reply to comment #2)
> Why is it bogus?

Oops, right. Double quotes aren't bogus. I was confused by my vague recollection of quote oddities in MIME. But the oddity is that single quotes aren't quotes. Double quotes aren't bogus.

Need to locate the code that extracts the parameter value from the HTTP header...