Bugzilla – Bug 14
Validate xml-stylesheet PIs
Last modified: 2010-05-25 10:32:25 CEST
[21:56] <zcorpan> hsivonen: consider it a feature request to validate xml-stylesheet PIs :) (i'll spec down conf requirements for it in my draft, though it already defines what is a parse error...)
Spec with (style sheet language generic) document conformance reqs: http://www.w3.org/XML/2009/12/xml-stylesheet/ Also relevant: http://dev.w3.org/csswg/cssom/#requirements-on-user-agents-implementing
Examples of invalid documents: <?xml-stylesheet href="<"?><x/> <!DOCTYPE x[<!ENTITY x "y">]><?xml-stylesheet href="&x;"?><x/> <?xml-stylesheet href="�"?><x/> <?xml-stylesheet href="x" href="y"?><x/> <x><?xml-stylesheet href="x"?></x> <x/><?xml-stylesheet href="x"?> <?xml-stylesheet?><x/> <?xml-stylesheet href="invalid url"?><x/> <?xml-stylesheet href="x" type="y"?><x/> <?xml-stylesheet href="x" media="y"?><x/> <?xml-stylesheet href="x" charset="ascii"?><x/> <?xml-stylesheet href="x" alternate="yes"?><x/> <?xml-stylesheet href="x" alternate="yes" title=""?><x/> <?xml-stylesheet href="x" alternate="YES" title="x"?><x/> <?xml-stylesheet href="x" y=""?><x/> Examples of valid documents: <?xml-stylesheet href="common.css"?> <?xml-stylesheet href="default.css" title="Default style"?> <?xml-stylesheet alternate="yes" href="alt.css" title="Alternative style"?> <?xml-stylesheet href="single-col.css" media="all and (max-width: 30em)"?> <x/> <?xml-stylesheet href="x" alternate="yes" title="x"?><x/> <?xml-stylesheet href="x" type="TEXT/CSS"?><x/> Things to consider warning about: <!DOCTYPE x[<?xml-stylesheet href="x"?>]><x/> likely-typoed or known problematic type values? charset - is ignored in some browsers iirc title, media, charset and alternate when type indicates XSLT (I think text/xsl, text/xml or application/xml), since browsers ignore those for XSLT technically text/xsl is not registered, but the only supported value for type in IE, so probably not helpful to whine about it
Another thing to consider warning about is multiple PIs for XSLT, since browsers use just one PI for XSLT.
Some old browser tests at http://simon.html5.org/test/xml/xml-stylesheet/ might be useful also.
valid: <?xml-stylesheet href="" title=">"?><x/> invalid: <?xml-stylesheet href="">?><x/> <?xml-stylesheet href=""/><xml-stylesheet?><x/>
(In reply to comment #3) > Another thing to consider warning about is multiple PIs for XSLT, since > browsers use just one PI for XSLT. OK, that does seem like a case that content authors should be made aware of. But is that fact (the fact that browsers use just one PI for XSLT) actually documented anywhere at all? Because if we have v.nu emit a warning about this case, it would be best to have some published text to refer to. I'm wondering if there's any possibility that the XML Core WG could consider adding a SHOULD-level admonition about this to the "Associating Style Sheets with XML documents" draft.
(In reply to comment #2) > charset - is ignored in some browsers iirc > > title, media, charset and alternate when type indicates XSLT (I think text/xsl, > text/xml or application/xml), since browsers ignore those for XSLT Is there documentation of any kind anywhere which notes that UAs ignore those pseudo-attributes for those cases? If not, maybe at least we can something up on the WHATWG Wiki. And is there any documentation anywhere that states that type=text/xml and type=application/xml indicate XSLT?
Do browsers correctly recognize type=application/xslt+xml? If so, it'd seem the complete list of xml-stylesheet type values that indicate XSLT are: application/xml application/xslt+xml text/xml text/xsl Would we want to warn about any of those as being not preferred; that is, with a message like, "The type value foo/bar for indicating an XSLT stylesheet is not preferred. Consider using hoge/moge instead."
(In reply to comment #8) > application/xml > application/xslt+xml > text/xml > text/xsl I see from reading Anne's blog that the list needs to include "text/xslt" as well. And I do realize that the spec says the type attribute is merely advisory, so maybe it's not worth reporting and preferred value to users for the XSLT case.
The XML Core WG didn't want to have anything in the xml-stylesheet spec that only applies to a specific style sheet language. It would be something for the XSLT WG. Opera, WebKit and Mozilla don't support application/xslt+xml or text/xslt. Haven't tested IE. I'm not aware of any documentation. :-( I think the only cross-browser safe values are "text/css" and "text/xsl" (ascii-case-insensitive), so you could warn for anything else, saying that some browsers will ignore the pi.
Invalid: <?xml-stylesheet href='' xmlns=''?><x/> <?xml-stylesheet href='' xmlns:xml='http://www.w3.org/XML/1998/namespace'?><x/>
(In reply to comment #2) > charset - is ignored in some browsers iirc If you can find any data on that, please add a follow-up comment. Or, maybe we should just go ahead and set up a page on the whatwg wiki. Maybe http://wiki.whatwg.org/wiki/XmlStylesheetPi ..? Then I could have some of these warning messages include a link to that page. > title, media, charset and alternate when type indicates XSLT (I think text/xsl, > text/xml or application/xml), since browsers ignore those for XSLT Again, if you have any data on that, it'd be nice to have it available online.
(In reply to comment #2) > <?xml-stylesheet href="x" charset="ascii"?><x/> This case is not going to get correctly reported as invalid until we make an update to the charset checker. See bug 693.
Support for this is now live on http://qa-dev.w3.org:8888/ Please test there and let me know if you find any problems. After making any necessary changes, I can then get the patch to Henri for review. As far as I can tell, it currently behaves as expected for all the test cases provided so far -- except the <?xml-stylesheet href="x" charset="ascii"?> case (for reasons I noted in comment #13),
Invalid: <?xml-stylesheet href="" x?><x/> <?xml-stylesheet href="" x ?><x/> <?xml-stylesheet href="" x=?><x/> <?xml-stylesheet href="" x= ?><x/> <?xml-stylesheet href="" x="?><x/> <?xml-stylesheet href="" x="y?><x/>
I've created the wiki page... but it doesn't cover much yet.
(In reply to comment #15) > Invalid: > <?xml-stylesheet href="" x?><x/> > <?xml-stylesheet href="" x ?><x/> > <?xml-stylesheet href="" x=?><x/> > <?xml-stylesheet href="" x= ?><x/> > <?xml-stylesheet href="" x="?><x/> > <?xml-stylesheet href="" x="y?><x/> Thanks -- I made some some changes and I think you should now get expected behavior for all of those. Please test on http://qa-dev.w3.org:8888/ and let me know if you see any remaining problems (or regressions...)
Created attachment 135 [details] patch
about http://krijnhoetmer.nl/irc-logs/whatwg/20091226#l-120 <zcorpan> <?xml-stylesheet href="" title="&"?> should be invalid I made a change for that and it should not be getting reported as invalid. You can test on http://qa-dev.w3.org:8889/
about http://krijnhoetmer.nl/irc-logs/whatwg/20091226#l-120 <zcorpan> <?xml-stylesheet href="" title="&"?> should be invalid I made a change for that and it should now be getting reported as invalid. You can test on http://qa-dev.w3.org:8889/
(In reply to comment #19) > I made a change for that and it should not be getting reported as invalid. You meant to type "now" there, as in subsequent comment. So please ignore this one..
(In reply to comment #20) can test on http://qa-dev.w3.org:8889/ http://qa-dev.w3.org:8888/ instead of course
about http://krijnhoetmer.nl/irc-logs/whatwg/20091226#l-134 <zcorpan> MikeSmith: type="TEXT/XML" ... should be case-insensitive Fixed now <zcorpan> my point is ideally it should be case-insensitive and support parameters <zcorpan> so <?xml-stylesheet href="" type="TEXT/XML"?><?xml-stylesheet href="" type="application/xml; charset=utf-8"?> should give a message about multiple xslt pi OK, that should work as expected now <zcorpan> type="text/xslLOL" That will (correctly) not be reported as an XSLT indicator. (The code now looks for matches for "^text/xsl(;.*)?$", "^application/xml(;.*)$", etc. -- after first checking to make sure the entire value is RFC-compliant. (In reply to comment #10) > I think the only cross-browser safe values are "text/css" and "text/xsl" > (ascii-case-insensitive), so you could warn for anything else, saying that some > browsers will ignore the pi. OK, added that also (it checks the type/subtype part on any RFC-compliant values, including ones with parameters).
Invalid: <?xml-stylesheet href="" title=" "?><x/> (XML CharRef uses lowercase x.)
(In reply to comment #13) > (In reply to comment #2) > > <?xml-stylesheet href="x" charset="ascii"?><x/> > This case is not going to get correctly reported as invalid until we make an > update to the charset checker. See bug 693. I believe I now have charset checking working.
http://krijnhoetmer.nl/irc-logs/whatwg/20091226#l-110 <zcorpan> <?xml-stylesheet href="" title="
"?> should be valid I believe I have that fixed now.
(In reply to comment #24) > Invalid: > > <?xml-stylesheet href="" title=" "?><x/> > > (XML CharRef uses lowercase x.) Fixed now, I think
Created attachment 136 [details] patch
Created attachment 140 [details] patch
http://krijnhoetmer.nl/irc-logs/whatwg/20091229#l-621 <zcorpan> MikeSmith: maybe checking for valuelessness and lessthanness should be in the tokenizer instead of the switch block <zcorpan> a < will cause browsers to ignore the whole pi <zcorpan> same with valueless OK, I've moved checking for both of those into the tokenizer code (into the addAttributeWithValue and addAttributeWithoutValue methods)
http://krijnhoetmer.nl/irc-logs/whatwg/20091229#l-658 <zcorpan> MikeSmith: i think there are some things that could be removed in the tokenizer, given that some characters are not allowed in xml <zcorpan> like case '\u000C': and case '\u0000': OK, removed now
http://krijnhoetmer.nl/irc-logs/whatwg/20091229#l-665 <zcorpan> '\r' also can't appear in pi data <zcorpan> remove, since it can't happen <zcorpan> \r is normalized to \n <zcorpan> by the xml parser OK, removed now
<zcorpan> MikeSmith: maybe you can remove the stuff in the tokenizer that checks for semicolonless entities <zcorpan> the table doesn't contain semicolonless entities <zcorpan> so it's dead code i think OK, removed now
<zcorpan> did you implement the CharRef Legal Character thing? <MikeSmith> zcorpan: you mean an explicit check for "Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]" ? added now
Question: About the following chunk of code that I copied over from the existing htmlparser handleNcrValue method: [[ if (value >= 0xFDD0 && value <= 0xFDEF) { errNcrUnassigned(); } else if ((value & 0xFFFE) == 0xFFFE) { ch = errNcrNonCharacter(ch); } else if (value >= 0x007F && value <= 0x009F) { errNcrControlChar(); ]] The range "value >= 0xFDD0 && value <= 0xFDEF" is legal in XML, right? If so, I'll need to change that error to a warning. Same question for the range "value >= 0x007F && value <= 0x009F".. also legal in XML, right?
(In reply to comment #2) > charset - is ignored in some browsers iirc > > title, media, charset and alternate when type indicates XSLT (I think text/xsl, > text/xml or application/xml), since browsers ignore those for XSLT OK, I've now added warnings for both of those cases.
(In reply to comment #35) > The range "value >= 0xFDD0 && value <= 0xFDEF" is legal in XML, right? If so, > I'll need to change that error to a warning. I kept that one as an error since those code points are clearly not characters. > Same question for the range "value >= 0x007F && value <= 0x009F".. also legal > in XML, right? I made that a warning.
Invalid: <?xml-stylesheet href="" title="& "?><x/> <?xml-stylesheet href="" title="&\n"?><x/> <?xml-stylesheet href="" title="&\t"?><x/> <?xml-stylesheet href="" title="&<"?><x/> <?xml-stylesheet href="" title="&&"?><x/> (i.e. remove the first bunch of cases in CONSUME_CHARACTER_REFERENCE)
"^text/xsl(;.*)?$" could be "^text/xsl(;|$)" instead, i think
(for the record) Invalid: <?xml-stylesheet HREF=""?><x/>
(In reply to comment #38) > Invalid: > > <?xml-stylesheet href="" title="& "?><x/> > <?xml-stylesheet href="" title="&\n"?><x/> > <?xml-stylesheet href="" title="&\t"?><x/> > <?xml-stylesheet href="" title="&<"?><x/> > <?xml-stylesheet href="" title="&&"?><x/> > > (i.e. remove the first bunch of cases in CONSUME_CHARACTER_REFERENCE) OK, thanks, done
(In reply to comment #39) > "^text/xsl(;.*)?$" could be "^text/xsl(;|$)" instead, i think I tested that but found that it fails to match against, e.g, <?xml-stylesheet href="" type="text/xslt; charset=utf-8"?><x/> Reading up a little on the Java String.matches method, I am reminded that it always does an exact match against the entire string, from start to end, and the ^ and $ operators don't actually do anything with String.matches. So I changed those to just, e.g., "text/xsl(;.*)?" instead.
syntax r559
validator 359
Created attachment 165 [details] validator patch
Created attachment 166 [details] syntax patch