Bugzilla – Bug 762
meta refresh accepts quoted URIs
Last modified: 2011-04-19 11:15:02 CEST
<meta http-equiv="Refresh" content="20; URL='page4.html'">
But, according to the spec:
"For meta elements with an http-equiv attribute in the Refresh state, the content attribute must have a value consisting either of:
* just a valid non-negative integer, or
* a valid non-negative integer, followed by a U+003B SEMICOLON character (;), followed by one or more space characters, followed by either a U+0055 LATIN CAPITAL LETTER U character (U) or a U+0075 LATIN SMALL LETTER U character (u), a U+0052 LATIN CAPITAL LETTER R character (R) or a U+0072 LATIN SMALL LETTER R character (r), a U+004C LATIN CAPITAL LETTER L character (L) or a U+006C LATIN SMALL LETTER L character (l), a U+003D EQUALS SIGN character (=), and then a valid URL."
(no quotes mentioned)
This seems to be a side effect of the fact that the code for checking IRI references accepts quoted IRI references. For example, it thinks <img src="'foo'" alt="bar"> is valid.
I'll poke around in the code and find out why.
So I checked the IRI-reference-checking code and I find that it's not doing anything special with the quote characters. It handles them just the same as any other character that's not A-Z or a-z. It doesn't recognize them as special delimiters of any kind. For example, it also says that
<meta http-equiv="Refresh" content="20; URL='page4.html)"> is valid?
Should it be handling the quote characters differently for this case? If so, are there other characters that is should be handling specially?
From reading RFC 3987, it's not clear to me that the checker code is doing anything wrong here.
Note that it does report an "Illegal character in scheme component " error for
<meta http-equiv="Refresh" content="20; URL='http://example.com/page4.html'">
Not to beat this into the ground, but note that the following also validates:
<meta http-equiv="Refresh" content="20; URL=http://example.com/page4.html'">
...because we are just passing that along to the Jena IRI library as-is, and it it's not reporting any error due to that single quote character at end of the path part of the URL.
So as far as I can see, the Jena IRI code doesn't consider quote characters in path names to be invalid. I don't know if it should or not, but regardless, if there's an error here, it's seems clear to me it's not in the validator.nu IRI-checking code.
So I'm moving this to Resolved:Intentional. But if you think there's something more/different we should be doing in the validator code, feel free to reopen it.
It appears this is a bug in the spec.
The parsing rules treat " and ' as special, whereas the conformance rules do not mention them. So a value of
is parsed as
which is a valid URI reference. However, the spec requires the recipient to treatthis as
(stripping the leading ", and truncating at the 2nd).
This does seem to be something that should be changed in the spec, e.g., by introducing the notion of a "valid optionally quoted URL". Though I don't think we'd need to add an additional dataype to the datatype library, since this in the only case where it would be used -- the additional checking would just be added to the existing state machine in the Refresh.java code (all assuming that the spec does in fact end up getting changed to address this).
reopening this, as Hixie made a change to the spec:
--- source (revision 5839)
+++ source (revision 5840)
@@ -14897,8 +14897,9 @@
title="space character">space characters</span>, followed by a
substring that is an <span>ASCII case-insensitive</span> match
for the string "<code title="">URL</code>", followed by a U+003D
- EQUALS SIGN character (=), followed by a <span>valid
+ EQUALS SIGN character (=), followed by a <span>valid URL</span>
+ that does not start with a literal U+0027 APOSTROPHE (') or
+ U+0022 QUOTATION MARK (") character.</li>
Julian, thanks for having raised this. It's deployed on http://www.w3.org/html/check
If you test there and find any problems, please let me know.