Bug 762 – meta refresh accepts quoted URIs

NOTE: The current preferred location for bug reports is the GitHub issue tracker.

Bug 762 - meta refresh accepts quoted URIs


Summary:	meta refresh accepts quoted URIs

Status:	RESOLVED FIXED

Product:	Validator.nu
Classification:	Unclassified
Component:	General
Version:	HEAD
Hardware:	All All

Importance:	P2 normal
Assigned To:	Michael[tm] Smith

URL:	http://dev.w3.org/html5/spec/Overview...

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2010-08-18 13:01 CEST by Julian Reschke
Modified:	2011-04-19 11:15 CEST (History)
CC List:	1 user (show)

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Julian Reschke 2010-08-18 13:01:46 CEST

Example:

	<meta http-equiv="Refresh" content="20; URL='page4.html'">

But, according to the spec:

"For meta  elements with an http-equiv  attribute in the Refresh state, the content  attribute must have a value consisting either of:

    * just a valid non-negative integer, or
    * a valid non-negative integer, followed by a U+003B SEMICOLON character (;), followed by one or more space characters, followed by either a U+0055 LATIN CAPITAL LETTER U character (U) or a U+0075 LATIN SMALL LETTER U character (u), a U+0052 LATIN CAPITAL LETTER R character (R) or a U+0072 LATIN SMALL LETTER R character (r), a U+004C LATIN CAPITAL LETTER L character (L) or a U+006C LATIN SMALL LETTER L character (l), a U+003D EQUALS SIGN character (=), and then a valid URL."

(no quotes mentioned)

Comment 1 Michael[tm] Smith 2010-12-23 10:05:48 CET

This seems to be a side effect of the fact that the code for checking IRI references accepts quoted IRI references. For example, it thinks <img src="'foo'" alt="bar"> is valid.

I'll poke around in the code and find out why.

Comment 2 Michael[tm] Smith 2010-12-23 10:38:34 CET

So I checked the IRI-reference-checking code and I find that it's not doing anything special with the quote characters. It handles them just the same as any other character that's not A-Z or a-z. It doesn't recognize them as special delimiters of any kind. For example, it also says that
<meta http-equiv="Refresh" content="20; URL='page4.html)"> is valid?

Should it be handling the quote characters differently for this case? If so, are there other characters that is should be handling specially?

From reading RFC 3987, it's not clear to me that the checker code is doing anything wrong here.

Note that it does report an "Illegal character in scheme component " error for
<meta http-equiv="Refresh" content="20; URL='http://example.com/page4.html'">

Comment 3 Michael[tm] Smith 2010-12-23 10:58:22 CET

Not to beat this into the ground, but note that the following also validates:

<meta http-equiv="Refresh" content="20; URL=http://example.com/page4.html'">


 ...because we are just passing that along to the Jena IRI library as-is, and it it's not reporting any error due to that single quote character at end of the path part of the URL.

So as far as I can see, the Jena IRI code doesn't consider quote characters in path names to be invalid. I don't know if it should or not, but regardless, if there's an error here, it's seems clear to me it's not in the validator.nu IRI-checking code.

So I'm moving this to Resolved:Intentional. But if you think there's something more/different we should be doing in the validator code, feel free to reopen it.

Comment 4 Julian Reschke 2010-12-23 11:08:32 CET

It appears this is a bug in the spec.

The parsing rules treat " and ' as special, whereas the conformance rules do not mention them. So a value of

URL="a"b

is parsed as

"a"b

which is a valid URI reference. However, the spec requires the recipient to treatthis as

a

(stripping the leading ", and truncating at the 2nd).

Comment 5 Michael[tm] Smith 2010-12-24 03:01:56 CET

http://www.w3.org/Bugs/Public/show_bug.cgi?id=11597


This does seem to be something that should be changed in the spec, e.g., by introducing the notion of a "valid optionally quoted URL". Though I don't think we'd need to add an additional dataype to the datatype library, since this in the only case where it would be used -- the additional checking would just be added to the existing state machine in the Refresh.java code (all assuming that the spec does in fact end up getting changed to address this).

Comment 6 Michael[tm] Smith 2011-02-08 10:58:42 CET

reopening this, as Hixie made a change to the spec:

http://html5.org/tools/web-apps-tracker?from=5839&to=5840

Index: source
===================================================================
--- source	(revision 5839)
+++ source	(revision 5840)
@@ -14897,8 +14897,9 @@
      title="space character">space characters</span>, followed by a
      substring that is an <span>ASCII case-insensitive</span> match
      for the string "<code title="">URL</code>", followed by a U+003D
-     EQUALS SIGN character (=), followed by a <span>valid
-     URL</span>.</li>
+     EQUALS SIGN character (=), followed by a <span>valid URL</span>
+     that does not start with a literal U+0027 APOSTROPHE (') or
+     U+0022 QUOTATION MARK (") character.</li>
 
     </ul>

Comment 7 Michael[tm] Smith 2011-02-09 22:45:04 CET

https://bitbucket.org/validator/syntax-patches/qseries?apply=t&qs_apply=meta-refresh

Comment 8 Michael[tm] Smith 2011-04-19 11:15:02 CEST

https://bitbucket.org/validator/syntax/changeset/778e579daa80

Julian, thanks for having raised this. It's deployed on http://www.w3.org/html/check
If you test there and find any problems, please let me know.