NOTE: The current preferred location for bug reports is the GitHub issue tracker.
Bug 14 - Validate xml-stylesheet PIs
Validate xml-stylesheet PIs
Status: RESOLVED FIXED
Product: Validator.nu
Classification: Unclassified
Component: Non-schema checkers
HEAD
All All
: P2 enhancement
Assigned To: Michael[tm] Smith
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-02-22 15:12 CET by Henri Sivonen
Modified: 2010-05-25 10:32 CEST (History)
2 users (show)

See Also:


Attachments
patch (76.86 KB, patch)
2009-12-26 13:45 CET, Michael[tm] Smith
Details
patch (105.50 KB, patch)
2009-12-28 15:00 CET, Michael[tm] Smith
Details
patch (73.42 KB, patch)
2009-12-30 02:38 CET, Michael[tm] Smith
Details
validator patch (4.89 KB, patch)
2010-05-25 10:20 CEST, Michael[tm] Smith
Details
syntax patch (79.07 KB, patch)
2010-05-25 10:21 CEST, Michael[tm] Smith
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Henri Sivonen 2008-02-22 15:12:42 CET
[21:56] <zcorpan> hsivonen: consider it a feature request to validate xml-stylesheet PIs :) (i'll spec down conf requirements for it in my draft, though it already defines what is a parse error...)
Comment 1 Simon Pieters 2009-12-05 23:37:41 CET
Spec with (style sheet language generic) document conformance reqs:

http://www.w3.org/XML/2009/12/xml-stylesheet/

Also relevant:
http://dev.w3.org/csswg/cssom/#requirements-on-user-agents-implementing
Comment 2 Simon Pieters 2009-12-06 00:46:12 CET
Examples of invalid documents:

<?xml-stylesheet href="<"?><x/>

<!DOCTYPE x[<!ENTITY x "y">]><?xml-stylesheet href="&x;"?><x/>

<?xml-stylesheet href="&#0;"?><x/>

<?xml-stylesheet href="x" href="y"?><x/>

<x><?xml-stylesheet href="x"?></x>

<x/><?xml-stylesheet href="x"?>

<?xml-stylesheet?><x/>

<?xml-stylesheet href="invalid url"?><x/>

<?xml-stylesheet href="x" type="y"?><x/>

<?xml-stylesheet href="x" media="y"?><x/>

<?xml-stylesheet href="x" charset="ascii"?><x/>

<?xml-stylesheet href="x" alternate="yes"?><x/>

<?xml-stylesheet href="x" alternate="yes" title=""?><x/>

<?xml-stylesheet href="x" alternate="YES" title="x"?><x/>

<?xml-stylesheet href="x" y=""?><x/>


Examples of valid documents:

<?xml-stylesheet href="common.css"?>
<?xml-stylesheet href="default.css" title="Default style"?>
<?xml-stylesheet alternate="yes" href="alt.css" title="Alternative style"?>
<?xml-stylesheet href="single-col.css" media="all and (max-width: 30em)"?>
<x/>

<?xml-stylesheet href="x" alternate="&#121;es" title="x"?><x/>

<?xml-stylesheet href="x" type="TEXT/CSS"?><x/>


Things to consider warning about:

<!DOCTYPE x[<?xml-stylesheet href="x"?>]><x/>

likely-typoed or known problematic type values?

charset - is ignored in some browsers iirc

title, media, charset and alternate when type indicates XSLT (I think text/xsl, text/xml or application/xml), since browsers ignore those for XSLT

technically text/xsl is not registered, but the only supported value for type in IE, so probably not helpful to whine about it
Comment 3 Simon Pieters 2009-12-06 01:05:09 CET
Another thing to consider warning about is multiple PIs for XSLT, since browsers use just one PI for XSLT.
Comment 4 Simon Pieters 2009-12-07 00:50:19 CET
Some old browser tests at http://simon.html5.org/test/xml/xml-stylesheet/ might be useful also.
Comment 5 Simon Pieters 2009-12-15 23:15:22 CET
valid:

<?xml-stylesheet href="" title=">"?><x/>

invalid:

<?xml-stylesheet href="">?><x/>

<?xml-stylesheet href=""/><xml-stylesheet?><x/>
Comment 6 Michael[tm] Smith 2009-12-16 09:46:25 CET
(In reply to comment #3)
> Another thing to consider warning about is multiple PIs for XSLT, since
> browsers use just one PI for XSLT.

OK, that does seem like a case that content authors should be made aware of. But is that fact (the fact that browsers use just one PI for XSLT) actually documented anywhere at all? Because if we have v.nu emit a warning about this case, it would be best to have some published text to refer to.

I'm wondering if there's any possibility that the XML Core WG could consider adding a SHOULD-level admonition about this to the "Associating Style Sheets with XML documents" draft.
Comment 7 Michael[tm] Smith 2009-12-16 09:54:42 CET
(In reply to comment #2)
> charset - is ignored in some browsers iirc
> 
> title, media, charset and alternate when type indicates XSLT (I think text/xsl,
> text/xml or application/xml), since browsers ignore those for XSLT

Is there documentation of any kind anywhere which notes that UAs ignore those pseudo-attributes for those cases?

If not, maybe at least we can something up on the WHATWG Wiki.

And is there any documentation anywhere that states that type=text/xml and type=application/xml indicate XSLT?
Comment 8 Michael[tm] Smith 2009-12-16 10:07:52 CET
Do browsers correctly recognize type=application/xslt+xml?

If so, it'd seem the complete list of xml-stylesheet type values that indicate XSLT are:

application/xml
application/xslt+xml
text/xml
text/xsl

Would we want to warn about any of those as being not preferred; that is, with a message like, "The type value foo/bar for indicating an XSLT stylesheet is not preferred. Consider using hoge/moge instead."
Comment 9 Michael[tm] Smith 2009-12-16 10:21:12 CET
(In reply to comment #8)
> application/xml
> application/xslt+xml
> text/xml
> text/xsl

I see from reading Anne's blog that the list needs to include "text/xslt" as well.

And I do realize that the spec says the type attribute is merely advisory, so maybe it's not worth reporting and preferred value to users for the XSLT case.
Comment 10 Simon Pieters 2009-12-16 13:47:36 CET
The XML Core WG didn't want to have anything in the xml-stylesheet spec that only applies to a specific style sheet language. It would be something for the XSLT WG.

Opera, WebKit and Mozilla don't support application/xslt+xml or text/xslt. Haven't tested IE.

I'm not aware of any documentation. :-(

I think the only cross-browser safe values are "text/css" and "text/xsl" (ascii-case-insensitive), so you could warn for anything else, saying that some browsers will ignore the pi.
Comment 11 Simon Pieters 2009-12-18 10:08:35 CET
Invalid:

<?xml-stylesheet href='' xmlns=''?><x/>

<?xml-stylesheet href='' xmlns:xml='http://www.w3.org/XML/1998/namespace'?><x/>
Comment 12 Michael[tm] Smith 2009-12-24 05:28:20 CET
(In reply to comment #2)
> charset - is ignored in some browsers iirc

If you can find any data on that, please add a follow-up comment.

Or, maybe we should just go ahead and set up a page on the whatwg wiki. Maybe http://wiki.whatwg.org/wiki/XmlStylesheetPi ..? Then I could have some of these warning messages include a link to that page.

> title, media, charset and alternate when type indicates XSLT (I think text/xsl,
> text/xml or application/xml), since browsers ignore those for XSLT

Again, if you have any data on that, it'd be nice to have it available online.
Comment 13 Michael[tm] Smith 2009-12-24 09:54:34 CET
(In reply to comment #2)
> <?xml-stylesheet href="x" charset="ascii"?><x/>
This case is not going to get correctly reported as invalid until we make an update to the charset checker. See bug 693.
Comment 14 Michael[tm] Smith 2009-12-24 11:12:00 CET
Support for this is now live on http://qa-dev.w3.org:8888/
Please test there and let me know if you find any problems.

After making any necessary changes, I can then get the patch to Henri for review.

As far as I can tell, it currently behaves as expected for all the test cases provided so far -- except the <?xml-stylesheet href="x" charset="ascii"?> case (for reasons I noted in comment #13),
Comment 15 Simon Pieters 2009-12-25 17:52:35 CET
Invalid:

<?xml-stylesheet href="" x?><x/>

<?xml-stylesheet href="" x ?><x/>

<?xml-stylesheet href="" x=?><x/>

<?xml-stylesheet href="" x= ?><x/>

<?xml-stylesheet href="" x="?><x/>

<?xml-stylesheet href="" x="y?><x/>
Comment 16 Simon Pieters 2009-12-25 23:57:33 CET
I've created the wiki page... but it doesn't cover much yet.
Comment 17 Michael[tm] Smith 2009-12-26 13:00:06 CET
(In reply to comment #15)
> Invalid:
> <?xml-stylesheet href="" x?><x/>
> <?xml-stylesheet href="" x ?><x/>
> <?xml-stylesheet href="" x=?><x/>
> <?xml-stylesheet href="" x= ?><x/>
> <?xml-stylesheet href="" x="?><x/>
> <?xml-stylesheet href="" x="y?><x/>

Thanks -- I made some some changes and I think you should now get expected behavior for all of those. Please test on http://qa-dev.w3.org:8888/ and let me know if you see any remaining problems (or regressions...)
Comment 18 Michael[tm] Smith 2009-12-26 13:45:56 CET
Created attachment 135 [details]
patch
Comment 19 Michael[tm] Smith 2009-12-27 08:50:41 CET
about http://krijnhoetmer.nl/irc-logs/whatwg/20091226#l-120
<zcorpan> <?xml-stylesheet href="" title="&"?> should be invalid

I made a change for that and it should not be getting reported as invalid. You can test on http://qa-dev.w3.org:8889/
Comment 20 Michael[tm] Smith 2009-12-27 08:51:03 CET
about http://krijnhoetmer.nl/irc-logs/whatwg/20091226#l-120
<zcorpan> <?xml-stylesheet href="" title="&"?> should be invalid

I made a change for that and it should now be getting reported as invalid. You can test on http://qa-dev.w3.org:8889/
Comment 21 Michael[tm] Smith 2009-12-27 08:53:15 CET
(In reply to comment #19)
> I made a change for that and it should not be getting reported as invalid. You

meant to type "now" there, as in subsequent comment. So please ignore this one..
Comment 22 Michael[tm] Smith 2009-12-27 08:54:23 CET
(In reply to comment #20)
 can test on http://qa-dev.w3.org:8889/

http://qa-dev.w3.org:8888/ instead of course
Comment 23 Michael[tm] Smith 2009-12-27 10:59:48 CET
about http://krijnhoetmer.nl/irc-logs/whatwg/20091226#l-134
<zcorpan> MikeSmith: type="TEXT/XML" ... should be case-insensitive

Fixed now

<zcorpan> my point is ideally it should be case-insensitive and support parameters  
<zcorpan> so <?xml-stylesheet href="" type="TEXT/XML"?><?xml-stylesheet href="" type="application/xml; charset=utf-8"?> should give a message about multiple xslt pi

OK, that should work as expected now

<zcorpan> type="text/xslLOL"

That will (correctly) not be reported as an XSLT indicator. (The code now looks for matches for "^text/xsl(;.*)?$", "^application/xml(;.*)$", etc. -- after first checking to make sure the entire value is RFC-compliant.

(In reply to comment #10)
> I think the only cross-browser safe values are "text/css" and "text/xsl"
> (ascii-case-insensitive), so you could warn for anything else, saying that some
> browsers will ignore the pi.

OK, added that also (it checks the type/subtype part on any RFC-compliant values, including ones with parameters).
Comment 24 Simon Pieters 2009-12-28 00:10:20 CET
Invalid:

<?xml-stylesheet href="" title="&#X20;"?><x/>

(XML CharRef uses lowercase x.)
Comment 25 Michael[tm] Smith 2009-12-28 08:48:11 CET
(In reply to comment #13)
> (In reply to comment #2)
> > <?xml-stylesheet href="x" charset="ascii"?><x/>
> This case is not going to get correctly reported as invalid until we make an
> update to the charset checker. See bug 693.

I believe I now have charset checking working.
Comment 26 Michael[tm] Smith 2009-12-28 08:49:10 CET
http://krijnhoetmer.nl/irc-logs/whatwg/20091226#l-110
<zcorpan> <?xml-stylesheet href="" title="&#x0D;"?> should be valid 

I believe I have that fixed now.
Comment 27 Michael[tm] Smith 2009-12-28 08:52:42 CET
(In reply to comment #24)
> Invalid:
> 
> <?xml-stylesheet href="" title="&#X20;"?><x/>
> 
> (XML CharRef uses lowercase x.)

Fixed now, I think
Comment 28 Michael[tm] Smith 2009-12-28 15:00:44 CET
Created attachment 136 [details]
patch
Comment 29 Michael[tm] Smith 2009-12-30 02:38:23 CET
Created attachment 140 [details]
patch
Comment 30 Michael[tm] Smith 2009-12-30 02:46:49 CET
http://krijnhoetmer.nl/irc-logs/whatwg/20091229#l-621
<zcorpan> MikeSmith: maybe checking for valuelessness and lessthanness should be in the tokenizer instead of the switch block  
<zcorpan> a < will cause browsers to ignore the whole pi  
<zcorpan> same with valueless  

OK, I've moved checking for both of those into the tokenizer code (into the addAttributeWithValue and addAttributeWithoutValue methods)
Comment 31 Michael[tm] Smith 2009-12-30 02:51:51 CET
http://krijnhoetmer.nl/irc-logs/whatwg/20091229#l-658
<zcorpan> MikeSmith: i think there are some things that could be removed in the tokenizer, given that some characters are not allowed in xml  
<zcorpan> like case '\u000C': and case '\u0000':

OK, removed now
Comment 32 Michael[tm] Smith 2009-12-30 02:55:20 CET
http://krijnhoetmer.nl/irc-logs/whatwg/20091229#l-665
<zcorpan> '\r' also can't appear in pi data  
<zcorpan> remove, since it can't happen  
<zcorpan> \r is normalized to \n  
<zcorpan> by the xml parser  

OK, removed now
Comment 33 Michael[tm] Smith 2009-12-30 02:57:54 CET
<zcorpan> MikeSmith: maybe you can remove the stuff in the tokenizer that checks for semicolonless entities  
<zcorpan> the table doesn't contain semicolonless entities  
<zcorpan> so it's dead code i think  

OK, removed now
Comment 34 Michael[tm] Smith 2009-12-30 02:58:57 CET
<zcorpan> did you implement the CharRef Legal Character thing?  
<MikeSmith> zcorpan: you mean an explicit check for "Char    ::=    #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]" ? 

added now
Comment 35 Michael[tm] Smith 2009-12-30 10:51:20 CET
Question: About the following chunk of code that I copied over from the existing htmlparser handleNcrValue method:

[[
if (value >= 0xFDD0 && value <= 0xFDEF) {
    errNcrUnassigned();
} else if ((value & 0xFFFE) == 0xFFFE) {
    ch = errNcrNonCharacter(ch);
} else if (value >= 0x007F && value <= 0x009F) {
    errNcrControlChar();
]]                 

The range "value >= 0xFDD0 && value <= 0xFDEF" is legal in XML, right? If so, I'll need to change that error to a warning.

Same question for the range "value >= 0x007F && value <= 0x009F".. also legal in XML, right?
Comment 36 Michael[tm] Smith 2009-12-30 13:45:09 CET
(In reply to comment #2)
> charset - is ignored in some browsers iirc
> 
> title, media, charset and alternate when type indicates XSLT (I think text/xsl,
> text/xml or application/xml), since browsers ignore those for XSLT

OK, I've now added warnings for both of those cases.
Comment 37 Michael[tm] Smith 2009-12-30 13:46:16 CET
(In reply to comment #35)
> The range "value >= 0xFDD0 && value <= 0xFDEF" is legal in XML, right? If so,
> I'll need to change that error to a warning.

I kept that one as an error since those code points are clearly not characters.

> Same question for the range "value >= 0x007F && value <= 0x009F".. also legal
> in XML, right?

I made that a warning.
Comment 38 Simon Pieters 2010-01-02 11:14:48 CET
Invalid:

<?xml-stylesheet href="" title="& "?><x/>
<?xml-stylesheet href="" title="&\n"?><x/>
<?xml-stylesheet href="" title="&\t"?><x/>
<?xml-stylesheet href="" title="&<"?><x/>
<?xml-stylesheet href="" title="&&amp;"?><x/>

(i.e. remove the first bunch of cases in CONSUME_CHARACTER_REFERENCE)
Comment 39 Simon Pieters 2010-01-02 11:15:36 CET
"^text/xsl(;.*)?$" could be "^text/xsl(;|$)" instead, i think
Comment 40 Simon Pieters 2010-01-02 11:21:13 CET
(for the record)

Invalid:

<?xml-stylesheet HREF=""?><x/>
Comment 41 Michael[tm] Smith 2010-01-03 05:41:11 CET
(In reply to comment #38)
> Invalid:
> 
> <?xml-stylesheet href="" title="& "?><x/>
> <?xml-stylesheet href="" title="&\n"?><x/>
> <?xml-stylesheet href="" title="&\t"?><x/>
> <?xml-stylesheet href="" title="&<"?><x/>
> <?xml-stylesheet href="" title="&&amp;"?><x/>
> 
> (i.e. remove the first bunch of cases in CONSUME_CHARACTER_REFERENCE)

OK, thanks, done
Comment 42 Michael[tm] Smith 2010-01-03 06:32:09 CET
(In reply to comment #39)
> "^text/xsl(;.*)?$" could be "^text/xsl(;|$)" instead, i think

I tested that but found that it fails to match against, e.g, <?xml-stylesheet href="" type="text/xslt; charset=utf-8"?><x/>

Reading up a little on the Java String.matches method, I am reminded that it always does an exact match against the entire string, from start to end, and the ^ and $ operators don't actually do anything with String.matches. So I changed those to just, e.g., "text/xsl(;.*)?" instead.
Comment 43 Michael[tm] Smith 2010-05-25 09:56:16 CEST
syntax r559
Comment 44 Michael[tm] Smith 2010-05-25 10:18:03 CEST
validator 359
Comment 45 Michael[tm] Smith 2010-05-25 10:20:18 CEST
Created attachment 165 [details]
validator patch
Comment 46 Michael[tm] Smith 2010-05-25 10:21:23 CEST
Created attachment 166 [details]
syntax patch