NOTE: The current preferred location for bug reports is the GitHub issue tracker.
Bug 671 - add assertions checking for conformance constraints on microdata attributes
add assertions checking for conformance constraints on microdata attributes
Status: RESOLVED FIXED
Product: Validator.nu
Classification: Unclassified
Component: HTML5 schema
HEAD
All All
: P2 normal
Assigned To: Philip Jägenstedt
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-10-23 06:01 CEST by Michael[tm] Smith
Modified: 2011-08-01 07:07 CEST (History)
2 users (show)

See Also:


Attachments
syntax bits (2.43 KB, patch)
2011-07-23 23:25 CEST, Philip Jägenstedt
Details
validator bits (1.25 KB, patch)
2011-07-23 23:25 CEST, Philip Jägenstedt
Details
itemtype must be an absolute URL (370 bytes, patch)
2011-07-23 23:42 CEST, Philip Jägenstedt
Details
itemprop (6.01 KB, patch)
2011-07-24 01:04 CEST, Philip Jägenstedt
Details
basic attribute constraints (2.63 KB, patch)
2011-07-24 21:28 CEST, Philip Jägenstedt
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Michael[tm] Smith 2009-10-23 06:01:21 CEST
currently lacking the following constraints:

- The itemtype attribute must not be specified on elements that do not have an
itemscope attribute specified.
- The itemtype attribute, if specified, must have a value that is a valid URL
that is an absolute URL for which the string
"http://www.w3.org/1999/xhtml/microdata#" is not a prefix match.
- (ID values in itemref) for each one, the element's nearest ancestor element with an itemscope attribute specified, if any, must not be the element with the referencing itemref attribute specified.
- The itemref attribute must not be specified on elements that do not have an itemscope attribute specified.
- The itemid attribute must not be specified on elements that do not have both an itemscope attribute and an itemtype attribute specified, and must not be specified on elements with an itemscope attribute whose itemtype attribute specifies a vocabulary that does not support global identifiers for items, as defined by that vocabulary's specification.

...and probably some more
Comment 1 Philip Jägenstedt 2011-07-23 18:35:40 CEST
- (ID values in itemref) for each one, the element's nearest ancestor element
with an itemscope attribute specified, if any, must not be the element with the
referencing itemref attribute specified.

This is no longer in the spec, it's replaced by:

- All itemref attributes in a Document must be such that there are no cycles in the graph formed from representing each item in the Document as a node in the graph and each property of an item whose value is another item as an edge in the graph connecting those two items.

Here are some more:

- A document must not contain any items for which the algorithm to find the properties of an item finds any microdata errors. (possibly equivalent to the above)

- A document must not contain any elements that have an itemprop attribute that would not be found to be a property of any of the items in that document were their properties all to be determined.

- The itemid attribute, if specified, must have a value that is a valid URL potentially surrounded by spaces.

- The itemref attribute, if specified, must have a value that is an unordered set of unique space-separated tokens that are case-sensitive, consisting of IDs of elements in the same home subtree.

- The itemref attribute must not be specified on elements that do not have an itemscope attribute specified.

- The itemprop attribute, if specified, must have a value that is an unordered set of unique space-separated tokens that are case-sensitive, representing the names of the name-value pairs that it adds. The attribute's value must have at least one token.

- If a property's value is an absolute URL, the property must be specified using a URL property element.

- If a property's value represents a date, time, or global date and time, the property must be specified using the datetime attribute of a time element.
Comment 2 Philip Jägenstedt 2011-07-23 18:43:12 CEST
I'm interested in working on this and have a bit of experience with Java. I suppose that it's syntax/relaxng/microdata.rnc I should be looking at to begin with, but how should I implement the checks that require traversing the items and properties in order to find unused ones?

(Yes, I'm very aware of Utøya, but really need some distraction to keep it together right now.)
Comment 3 Philip Jägenstedt 2011-07-23 18:46:00 CEST
Found something that looks relevant in syntax/non-schema/java/src/org/whattf/checker/
Comment 4 Philip Jägenstedt 2011-07-23 23:25:40 CEST
Created attachment 194 [details]
syntax bits
Comment 5 Philip Jägenstedt 2011-07-23 23:25:56 CEST
Created attachment 195 [details]
validator bits
Comment 6 Philip Jägenstedt 2011-07-23 23:29:12 CEST
The submitted patches take care of the most basic requirements of the spec. Please be ruthless in code review, I can take it :)
Comment 7 Philip Jägenstedt 2011-07-23 23:42:08 CEST
Created attachment 196 [details]
itemtype must be an absolute URL

If there's something similar to git format-patch for hg which preserves my commit message and authorship, please let me know. Perhaps you have another preferred way of submitting patches?
Comment 8 Philip Jägenstedt 2011-07-24 01:04:57 CEST
Created attachment 197 [details]
itemprop

itemprop="" validation brought up to speed with the spec. It's been a long time since reverse DNS identifiers were valid.
Comment 9 Philip Jägenstedt 2011-07-24 01:20:03 CEST
AFAICT, the 4 patches I submitted now take care of everything that can be checked by only considering the current element. For the remaining, it will be necessary to implement a microdata parser using SAX (it seems the validator doesn't play with DOM, right?) with enough bookkeeping to be able to check the remaining constraints.

If I do implement a microdata parser, it would seem a waste to not also consider doing vocabulary validation on the result. Is this something that you think belongs in validator.nu, or should I just write it in JavaScript for http://foolip.org/microdatajs/live/ ?
Comment 10 Philip Jägenstedt 2011-07-24 11:14:43 CEST
jgraham pointed me to hg export, I'll use that for future patches.
Comment 11 Michael[tm] Smith 2011-07-24 17:46:56 CEST
(In reply to comment #4)
> Created an attachment (id=194) [details]
> syntax bits

These checks should instead be done in the startElement(uri, localName, name, atts) method in this source file:

https://bitbucket.org/validator/syntax/src/tip/non-schema/java/src/org/whattf/checker/schematronequiv/Assertions.java
Comment 12 Michael[tm] Smith 2011-07-24 17:48:25 CEST
(In reply to comment #5)
> Created an attachment (id=195) [details]
> validator bits

These changes are not needed if the checks are done instead in the Assertions.java code
Comment 13 Michael[tm] Smith 2011-07-24 18:14:13 CEST
(In reply to comment #9)
> For the remaining, it will be
> necessary to implement a microdata parser using SAX (it seems the validator
> doesn't play with DOM, right?) with enough bookkeeping to be able to check the
> remaining constraints.

Yeah, as far as I know, I think you'd need to implement it that way, with a tree built from SAX parse events.
Comment 14 Philip Jägenstedt 2011-07-24 21:28:22 CEST
Created attachment 198 [details]
basic attribute constraints

Thanks Michael, hopefully this new patch should be more to your liking.
Comment 15 Henri Sivonen 2011-07-25 15:15:06 CEST
(In reply to comment #9)
> AFAICT, the 4 patches I submitted now take care of everything that can be
> checked by only considering the current element.

Great thank you!

> For the remaining, it will be
> necessary to implement a microdata parser using SAX (it seems the validator
> doesn't play with DOM, right?) with enough bookkeeping to be able to check the
> remaining constraints.

Yeah, for the rest, it's necessary to have a SAX handler that builds a Microdata model with the appropriate LocatorImpls attached to different parts of the model in order to be able to report errors.

Unfortunately, itemref seems to make streaming validation not work, so the model would need to be a tree of items (though, luckily, it doesn't need to be a full HTML document tree).

> If I do implement a microdata parser, it would seem a waste to not also
> consider doing vocabulary validation on the result. Is this something that you
> think belongs in validator.nu, or should I just write it in JavaScript for
> http://foolip.org/microdatajs/live/ ?

When Microdata launched, I was thinking it belonged in validator.nu, though at present, it's not that clear which vocabularies should be supported. If some clarity around that point emerges, then, sure. (Maybe the answer at present is the vocabs in the spec plus schema.org.)

At one point in 2009, planned writing code in this direction (building an in-memory Microdata model) but then got swamped with Gecko work when I had more license boilerplate than code in the .java files.

microdata-assert.patch looks OK. r=hsivonen. (In principle, analogous changes to assertions.sch are needed to keep Assertions.java and assertions.sch in sync.)

The new MicrodataProperty.java file lacks a license header. Other than that, r=hsivonen on syntax-itemprop.patch.

As a general matter of routine bureaucracy, I'd like to see an affirmative statement that your contributions to Validator.nu are licensed under the license of the files being patched for pre-existing files and under the MIT license for new files and that if you don't add your copyright notice to the license headers, you are waiving notice. (Sorry about the legalese; I do this with all new contributors in order to keep my own legal trail in order.)

r=hsivonen on syntax-itemtype-absolute.patch.

Thank you!
Comment 16 Philip Jägenstedt 2011-07-25 15:37:26 CEST
(In reply to comment #15)
> (In reply to comment #9)
> > For the remaining, it will be
> > necessary to implement a microdata parser using SAX (it seems the validator
> > doesn't play with DOM, right?) with enough bookkeeping to be able to check the
> > remaining constraints.
> 
> Yeah, for the rest, it's necessary to have a SAX handler that builds a
> Microdata model with the appropriate LocatorImpls attached to different parts
> of the model in order to be able to report errors.
> 
> Unfortunately, itemref seems to make streaming validation not work, so the
> model would need to be a tree of items (though, luckily, it doesn't need to be
> a full HTML document tree).

My plan was to build a tree of elements (not DOM Element, just the minimum necessary) that have id or item* attributes, then proceed approximately as per the spec. I don't know how I'd make the warnings still apply to a specific line and element, I suppose that's where LocatorImpl comes into play? Do you have an example of a check that occurs after the whole document has been parsed that I could look at?

> > If I do implement a microdata parser, it would seem a waste to not also
> > consider doing vocabulary validation on the result. Is this something that you
> > think belongs in validator.nu, or should I just write it in JavaScript for
> > http://foolip.org/microdatajs/live/ ?
> 
> When Microdata launched, I was thinking it belonged in validator.nu, though at
> present, it's not that clear which vocabularies should be supported. If some
> clarity around that point emerges, then, sure. (Maybe the answer at present is
> the vocabs in the spec plus schema.org.)

I was mainly interested in schema.org, since the validity rules are simple and also because I think that just trying to write a validator for it would bring up a lot of problems. (Many examples in the schema.org documentation are invalid in various ways, and I don't want to check them all by hand.) I'll file a new bug for this if I decide to work on it.

> microdata-assert.patch looks OK. r=hsivonen. (In principle, analogous changes
> to assertions.sch are needed to keep Assertions.java and assertions.sch in
> sync.)
> 
> The new MicrodataProperty.java file lacks a license header. Other than that,
> r=hsivonen on syntax-itemprop.patch.
> 
> As a general matter of routine bureaucracy, I'd like to see an affirmative
> statement that your contributions to Validator.nu are licensed under the
> license of the files being patched for pre-existing files and under the MIT
> license for new files and that if you don't add your copyright notice to the
> license headers, you are waiving notice. (Sorry about the legalese; I do this
> with all new contributors in order to keep my own legal trail in order.)

I left it out because I assumed you would want to do it yourself to get it right (and because it had trailing whitespace which I didn't want to commit). I hereby license all of my past, present and future contributions to validator.nu under the licenses that validator.nu uses at the time of submission.

> r=hsivonen on syntax-itemtype-absolute.patch.
> 
> Thank you!

Thank you! This code was easier to get into than most.
Comment 17 Philip Jägenstedt 2011-07-25 15:46:10 CEST
(In reply to comment #15)
> microdata-assert.patch looks OK. r=hsivonen. (In principle, analogous changes
> to assertions.sch are needed to keep Assertions.java and assertions.sch in
> sync.)

I assume this is syntax/relaxng/assertions.sch? Is that complementary or is it for testing Assertions.java? I'm happy to update it if I know the expected results.
Comment 18 Philip Jägenstedt 2011-07-25 23:04:00 CEST
(In reply to comment #16)
> My plan was to build a tree of elements (not DOM Element, just the minimum
> necessary) that have id or item* attributes, then proceed approximately as per
> the spec. I don't know how I'd make the warnings still apply to a specific line
> and element, I suppose that's where LocatorImpl comes into play? Do you have an
> example of a check that occurs after the whole document has been parsed that I
> could look at?

It looks like UsemapChecker.java answers all my questions.
Comment 19 Philip Jägenstedt 2011-07-26 11:09:12 CEST
Let me know if there's anything more I need to do before the patches get applied and go live. The next patch is coming along nicely, but might require a wee bit more review :)
Comment 20 Henri Sivonen 2011-07-27 16:47:50 CEST
(In reply to comment #17)
> (In reply to comment #15)
> > microdata-assert.patch looks OK. r=hsivonen. (In principle, analogous changes
> > to assertions.sch are needed to keep Assertions.java and assertions.sch in
> > sync.)
> 
> I assume this is syntax/relaxng/assertions.sch? Is that complementary or is it
> for testing Assertions.java? I'm happy to update it if I know the expected
> results.

Assertions.java is a optimization. It's supposed to check for the same things as assertions.sch but in a more efficient way. assertions.sch is being kept alive for the benefit of schema users who aren't using the custom Java code. 

I don't know if such usage is theoretical or if there are actual users.

(In reply to comment #18)
> (In reply to comment #16)
> > My plan was to build a tree of elements (not DOM Element, just the minimum
> > necessary) that have id or item* attributes, then proceed approximately as per
> > the spec. I don't know how I'd make the warnings still apply to a specific line
> > and element, I suppose that's where LocatorImpl comes into play? Do you have an
> > example of a check that occurs after the whole document has been parsed that I
> > could look at?
> 
> It looks like UsemapChecker.java answers all my questions.

Yeah, that's a good example.

(In reply to comment #16)
> I was mainly interested in schema.org, since the validity rules are simple and
> also because I think that just trying to write a validator for it would bring
> up a lot of problems. (Many examples in the schema.org documentation are
> invalid in various ways, and I don't want to check them all by hand.) I'll file
> a new bug for this if I decide to work on it.

OK.

> I left it out because I assumed you would want to do it yourself to get it
> right (and because it had trailing whitespace which I didn't want to commit). I
> hereby license all of my past, present and future contributions to validator.nu
> under the licenses that validator.nu uses at the time of submission.

Could you please check that I interpreted your intent correctly when adding the license header:
https://bitbucket.org/validator/syntax/src/65e72972fe38/relaxng/datatype/java/src/org/whattf/datatype/MicrodataProperty.java

(In reply to comment #19)
> Let me know if there's anything more I need to do before the patches get
> applied and go live.

I landed the patches and deployed them on Validator.nu. Thank you.
Comment 21 Philip Jägenstedt 2011-07-27 18:21:28 CEST
(In reply to comment #20)
> (In reply to comment #17)
> > (In reply to comment #15)
> > > microdata-assert.patch looks OK. r=hsivonen. (In principle, analogous changes
> > > to assertions.sch are needed to keep Assertions.java and assertions.sch in
> > > sync.)
> > 
> > I assume this is syntax/relaxng/assertions.sch? Is that complementary or is it
> > for testing Assertions.java? I'm happy to update it if I know the expected
> > results.
> 
> Assertions.java is a optimization. It's supposed to check for the same things
> as assertions.sch but in a more efficient way. assertions.sch is being kept
> alive for the benefit of schema users who aren't using the custom Java code. 
> 
> I don't know if such usage is theoretical or if there are actual users.

OK. Given that the upcoming patch cannot (I guess) be expressed in this way, I'll just ignore assertions.sch. If you want to keep the two in perfect sync, we could always move microdata bits of Assertions.java over to the new MicrodataChecker.java.

> > I left it out because I assumed you would want to do it yourself to get it
> > right (and because it had trailing whitespace which I didn't want to commit). I
> > hereby license all of my past, present and future contributions to validator.nu
> > under the licenses that validator.nu uses at the time of submission.
> 
> Could you please check that I interpreted your intent correctly when adding the
> license header:
> https://bitbucket.org/validator/syntax/src/65e72972fe38/relaxng/datatype/java/src/org/whattf/datatype/MicrodataProperty.java

Yes, that looks fine.