NOTE: The current preferred location for bug reports is the GitHub issue tracker.
Bug 866 - Make <table border> valid.
Make <table border> valid.
Status: RESOLVED FIXED
Product: Validator.nu
Classification: Unclassified
Component: General
HEAD
All All
: P2 normal
Assigned To: Michael[tm] Smith
http://www.cs.tut.fi/~jkorpela/test/p...
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-10-15 21:08 CEST by Jukka K. Korpela
Modified: 2011-11-14 13:34 CET (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jukka K. Korpela 2011-10-15 21:08:01 CEST
When <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> is used, it is OK to report an error (as this is currently not allowed in HTML5), but the validator then gets wild. It reports <meta charset=utf-8> as an error. While this is understandable if it purports to validate by HTML 4.01 rules, the construct
<p><table border><tr><td></table>
should pass whether HTML 4.01 or HTML5 rules are applied. Yet the validator reports the errors "Attribute value omitted for a non-boolean attribute" and "Element table not allowed as child of element p in this context", which are hardly understandable unless the validator switched to XHTML mode somehow.
Comment 1 Michael[tm] Smith 2011-10-19 06:59:25 CEST
(In reply to comment #0)
> When <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> is used,
> it is OK to report an error (as this is currently not allowed in HTML5),

Note that along with being a doctype that's not allowed, it also triggers quirks mode.

> but
> the validator then gets wild. It reports <meta charset=utf-8> as an error.
> While this is understandable if it purports to validate by HTML 4.01 rules,

Yeah, as far as I can see it's correct and expected in this case for that to be reported as an error -- because, based on that doctype, the checker assumes you want it to perform HTML 4.01 checking.

If you want it to do HTML5 checking instead, it seems like you'd want to manually set the Preset option to "HTML5".

> the
> construct
> <p><table border><tr><td></table>
> should pass whether HTML 4.01 or HTML5 rules are applied. Yet the validator
> reports the errors "Attribute value omitted for a non-boolean attribute"

That is definitely not an error for HTML5 documents -- that is, if you're both checking against the HTML5 conformance rules, with the Preset option set to "HTML5", and asking for the document to be parsed using the HTML5 parser, with the Parser option set to "HTML5".

If you try setting just the Preset option to "HTML5", you will still get that error. But if you also set the Parser option to "HTML5", the error should no longer appear.

But if you don't manually set either of those options, the checker defaults to checking the document against HTML4 conformance rules, and to having the parser be determined automatically by the doctype. In this case, it means it the HTML 4.01 parser. (Which as far as the backend code goes is really just sort of a pseudo-parser/mode of the HTML5 parser. But the parser behavior in that mode does differ in a number of way from the normal HTML5 parser.)

So I assume that there must be some general rule in HTML4 that attribute values can only be omitted for attributes whose sole allowed value is the same as the attribute name ("boolean" attributes in HTML5 terms). Otherwise I don't know why Henri would have added that check.

But even if so, it seems the HTML 4.01 spec intends for <table border> to be an exception to that general rule (if in fact there actually is such a rule in HTML 4.01). Because among other language, it has an example of <table border> and says that it's equivalent to <table frame="border" rules="all">.

So I guess the HTML 4.01 (pseudo)parser should probably either special-case <table border>, or eliminate the "Attribute value omitted for a non-boolean attribute" check entirely, if it's not based on actual conformance requirements in the HTML 4.01 spec.

We'll have to wait to hear from Henri about which of those he wants to do.

> and
> "Element table not allowed as child of element p in this context", which are
> hardly understandable unless the validator switched to XHTML mode somehow.

The cause of that is the <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> doctype. That doctype causes the parser to go into quirks mode, and in quirks mode, instead of the table start tag causing an implied p end tag to be generated, the table does actually end up becoming a child of the p. And so because the p element can't have table as a child, you get that error.
Comment 2 Michael[tm] Smith 2011-10-19 07:58:11 CEST
(In reply to comment #1)
> So I assume that there must be some general rule in HTML4 that attribute values
> can only be omitted for attributes whose sole allowed value is the same as the
> attribute name ("boolean" attributes in HTML5 terms). Otherwise I don't know
> why Henri would have added that check.
> ...
> So I guess the HTML 4.01 (pseudo)parser should probably either special-case
> <table border>, or eliminate the "Attribute value omitted for a non-boolean
> attribute" check entirely, if it's not based on actual conformance requirements
> in the HTML 4.01 spec.

So after looking back at the HTML 4.01 spec, I see that the "Boolean attributes" subsection of section "3.3.4 Attribute declarations" says:

http://www.w3.org/TR/html401/intro/sgmltut.html#h-3.3.4.2
"In HTML, boolean attributes may appear in minimized form -- the attribute's value appears alone in the element's start tag."

But it does not say that anywhere that non-boolean attributes, in general, may appear in minimized form. So that would seem to imply that non-boolean attributes must not appear in minimized form; hence, the check that Henri added to the HTML4 mode of the parser code, which conforms to the general case.

So for the specific case of <table border> it seems like conformance for HTML4 checking requires that not be flagged as an error. So it would need to be handled as a special-case exception in the parser code.

I've gone ahead and written up a patch that does that, and sent it to Henri for review.

In the mean time, I've pushed the patch to my dev version of the service:

  http://www.w3.org/html/check

So you can test it there for now.
Comment 3 Jukka K. Korpela 2011-10-19 10:15:52 CEST
(In reply to comment #2)

> But it [the HTML 4.01 spec]
> does not say that anywhere that non-boolean attributes, in general, may
> appear in minimized form.

It does not need to. Normatively, HTML 4.01 is based on SGML using concrete syntax that allows an attribute specification to be abbreviated to a value, when the element has an attribute with allowed values defined by enumeration. Thus, for example, <p right> is valid (short for <p align=right>), though it hardly works in browsers (and it passes http://validator.w3.org - even without a warning!)

> So for the specific case of <table border> it seems like conformance for HTML4
> checking requires that not be flagged as an error. So it would need to be
> handled as a special-case exception in the parser code.

The <table border> issue special only in the practical sense that browsers actually implement it. By normative specifications, it's just a special case of the above-mentioned rule (the frame attribute is declared with enumerated values, including border).

To correctly check conformance to HTML 4.01 specifications, you would also need to handle <p rtl> for example. It would be OK to issue a warning, but it is incorrect to issue an error message (as validator.nu now does, but validator.w3.org does not).

I don't see much point in trying to check HTML 4.01 conformance in validator.nu, as we already have the W3C validator (and the www.htmlhelp.com validator) for that. If such checking is to be performed, it should either be carried out correctly (in the formal sense) or at least a warning should be issued, in any report in that mode, about the kind of checking performed. (I cannot describe what it is now...)

I think it would be better to make validator.nu a pure HTML5 validator. Upon encountering an obsolete permitted doctype, it would of course issue a warning, and it upon encountering a non-permitted doctype (legacy doctype or something else), it should issue an error message but could proceed as if the doctype were not there.
Comment 4 Jukka K. Korpela 2011-10-19 10:28:58 CEST
(In reply to comment #1)

> (In reply to comment #0)
> > When <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> is used,
> > it is OK to report an error (as this is currently not allowed in HTML5),
> 
> Note that along with being a doctype that's not allowed, it also triggers
> quirks mode.

It triggers quirks mode in some browsers, but in the HTML 4.01 specification, there is no such concept as quirks mode.

> in quirks mode, instead of the table start tag causing an implied p
> end tag to be generated, the table does actually end up becoming a child of the
> p.

That's a violation of HTML 4.01 rules, which unambiguously imply that

<p><table><tr><td></table>

is valid and equivalent to

<p></p><table><tr><td></table>

independently of doctype. It would be OK, and follow the spirit of HTML 4.01, to issue a warning - about an empty <p> element, which is valid but discouraged by the HTML 4.01 spec. But it is incorrect to issue an error message, and even more incorrect to parse the conforming document in a manner that deviates from the specification.

I strongly suggest that in the case of applying HTML 4.01 -like validation, the diagnostics be accompanied with a statement saying what's happening: the document is being checked mostly by HTML 4.01 rules, but not according to the HTML 4.01 specification.
Comment 5 Michael[tm] Smith 2011-10-19 10:35:23 CEST
(In reply to comment #3)
> It does not need to. Normatively, HTML 4.01 is based on SGML using concrete
> syntax that allows an attribute specification to be abbreviated to a value,
> when the element has an attribute with allowed values defined by enumeration.
> Thus, for example, <p right> is valid (short for <p align=right>), though it
> hardly works in browsers (and it passes http://validator.w3.org - even without
> a warning!)

Wow, it seems crazy to have the existing W3C validator not emit an error for that.

Anyway, if browsers don't support it, it seems reasonable to warn about it.

> To correctly check conformance to HTML 4.01 specifications, you would also need
> to handle <p rtl> for example. It would be OK to issue a warning, but it is
> incorrect to issue an error message (as validator.nu now does, but
> validator.w3.org does not).

I would be happy to write up a patch for that one too, and for any other such cases.

> I don't see much point in trying to check HTML 4.01 conformance in
> validator.nu, as we already have the W3C validator (and the www.htmlhelp.com
> validator) for that.

Yeah, well, those validators have a serious deficiencies -- such as the one you pointed out, about the W3C validator not emitting an error for <p right>. I don't think it's particularly relevant at this point in time what the SGML spec says about instances like that -- there is no practical reason for a user to use that markup on the Web, and it's useful to alert them about it.

> If such checking is to be performed, it should either be
> carried out correctly (in the formal sense)

The SGML spec(s) is full of all kinds of crazy stuff that has no relevance to the Web. Strict adherence to formal SGML rules when checking content intended for distribution on the Web is senseless and counterproductive. And as far as the formal rules of the HTML4, that spec is so poorly written that trying to implement proper checking of it requires reading between the lines and making some assumptions and judgement calls. That's not the fault of conformance-checker implementors, it's the fault of the writers of that spec, and the fact they tried to bind it to SGML formalisms that have no relevance to actual Web UA behavior.

> or at least a warning should be
> issued, in any report in that mode, about the kind of checking performed. (I
> cannot describe what it is now...)
> 
> I think it would be better to make validator.nu a pure HTML5 validator. Upon
> encountering an obsolete permitted doctype, it would of course issue a warning,
> and it upon encountering a non-permitted doctype (legacy doctype or something
> else), it should issue an error message but could proceed as if the doctype
> were not there.

I will defer to Henri on that. I think the HTML4 checking in validator.nu has some utility currently at least, and is actually already superior to the HTML4 checking that the existing W3C validator does. Someday people might quit caring about that HTML4 checking in validator.nu at all and it could just be turned off. But until then I think it has some value and if there are deficiencies in it like the ones you've identified, and those don't take great effort to fix, then we should fix them.
Comment 6 Michael[tm] Smith 2011-10-19 10:42:49 CEST
(In reply to comment #4)
> It triggers quirks mode in some browsers, but in the HTML 4.01 specification,
> there is no such concept as quirks mode.

Yeah, that's another large deficiency in the HTML4 spec that makes it incompatible with the actual realities of the Web.

> That's a violation of HTML 4.01 rules, which unambiguously imply that
> 
> <p><table><tr><td></table>
> 
> is valid and equivalent to
> 
> <p></p><table><tr><td></table>

Right. That's a bug in the HTML4 spec that ought to be fixed. But since nobody's actually ever going to fix that bug in the spec, we fix it instead by trying to provide information to users that's actually going to help them, instead of say, optimizing the service to cater to pedants.

The checking that validator.nu is intended to be helpful to people who want to put their content up on the actual Web and have it be processed by real Web UAs as expected and without surprises.

> independently of doctype. It would be OK, and follow the spirit of HTML 4.01,
> to issue a warning - about an empty <p> element, which is valid but discouraged
> by the HTML 4.01 spec. But it is incorrect to issue an error message, and even
> more incorrect to parse the conforming document in a manner that deviates from
> the specification.

So, OK, maybe we can consider changing it to a warning instead of an error.

> I strongly suggest that in the case of applying HTML 4.01 -like validation, the
> diagnostics be accompanied with a statement saying what's happening: the
> document is being checked mostly by HTML 4.01 rules, but not according to the
> HTML 4.01 specification.

Yeah, that seems reasonable. I'll talk to Henri about it.
Comment 7 Jukka K. Korpela 2011-10-24 19:23:58 CEST
(In reply to comment #5)

> Anyway, if browsers don't support it, it seems reasonable to warn about it.

The construct <p right> has been valid (but unsupported) ever since HTML 3.2 and has passed validators without warning. The problem was not in the validators but in the fact that HTML had been specified as an SGML implementation but not implemented that way. One might ask whether validators were useful at all for HTML then, and the answer is that they were and are useful in many ways, but with inherent limitations and issues.

My main concern with validator.nu is that it is called a validator, confusing the difference between markup validation in the old sense with pragmatic, often heuristic and even subjective checking (which is _much_ more useful than markup validation if carried out on a reasonably sound basis). I guess it's a lost battle, because Henri prefers the word "validator". But here we have a special conflict, as validator.nu switches to HTML 4.01 mode and there _are_ validators for it.

> > I don't see much point in trying to check HTML 4.01 conformance in
> > validator.nu, as we already have the W3C validator (and the www.htmlhelp.com
> > validator) for that.
> 
> Yeah, well, those validators have a serious deficiencies

No, they are rather perfect. They are validators and should be judged on that basis, not for their not being something completely different.

> I don't think it's particularly relevant at this point in time what the SGML spec
> says about instances like that

Isn't it then better not to analyze a document by HTML 4.01 rules?

> I think the HTML4 checking in validator.nu has
> some utility currently at least, and is actually already superior to the HTML4
> checking that the existing W3C validator does.

It is pointless and technically all wrong to issue error messages as relating to HTML 4.01 when not actually applying HTML 4.01 rules. There might be a point in reporting things that don't work in current browsers, but there's no reason to claim this to be "HTML 4.01 validation".

By the way, the construct
<p><table border><tr><td></table>
is not just valid HTML 4.01. It also has predictable treatment in browsers, though the treatment differs the one suggested in HTML 4.01 specs (ignoring the empty P element, as opposite to rendering it as an empty paragraph with normal default margins, i.e. effectively generating an empty line).
Comment 8 Michael[tm] Smith 2011-10-26 10:09:49 CEST
http://hg.mozilla.org/projects/htmlparser/rev/ea9f12bca0eb
Comment 9 Jukka K. Korpela 2011-10-26 10:41:11 CEST
Validator.nu still issues the bogus error messages
"Attribute value omitted for a non-boolean attribute." (for <table border>)
"Element table not allowed as child of element p in this context."
(the table element is never allowed as a child of p, and in this context it is not being used that way)

If the first message is issued just because the patch hasn't been applied in production,
a) I don't think a bug should be classified as fixed before it has verifiably been fixed in production
b) the fix is an ad hoc way of suppressing bogus messages in a particular case, not the general problem with not dealing with attributes by HTML 4.01 spec, and not the problem of reporting an empty <p> element as an error.

For example, for the construct
<p right>
the error message
"Attribute value omitted for a non-boolean attribute. (HTML4-only error.)"
is fundamentally wrong. There is no HTML4-only error here; on the contrary, the construct is valid HTML 4 (though not allowed and not even defined in HTML 5 - so it's rather an HTML5-only error!). For "boolean" attributes (a sloppy term), as well as for any attribute with an enumerated set of values, HTML 4 allows the attribute _name_ and the equals sign, not the value, to be omitted.

An appropriate message would be a warning based on
http://www.w3.org/TR/html401/appendix/notes.html#h-B.3.3
and could say e.g.
"Minimized attribute syntax generally not supported by browsers."

For <p><table ...>, an appropriate warning would be based on
http://www.w3.org/TR/html401/struct/text.html#edef-P
and could say e.g.
"Empty P elements should not be used."
Comment 10 Michael[tm] Smith 2011-11-14 13:34:38 CET
(In reply to comment #9)
> a) I don't think a bug should be classified as fixed before it has verifiably
> been fixed in production

It's fixed in production now.

But the next time you feel like reopening a bug that I've resolve, note that for marking bugs as fixed, we follow the same convention that virtually every other open-source project in the universe follows, which is to say we mark them resolved/fixed when a code change has been checked in that fixes them, not necessarily when that code has been deployed in production.

If you want to independently verify the fix, you can do that by checking out the code and building it.

> b) the fix is an ad hoc way of suppressing bogus messages in a particular case,
> not the general problem with not dealing with attributes by HTML 4.01 spec, and
> not the problem of reporting an empty <p> element as an error.

So open a different bug for the general case.

> For example, for the construct
> <p right>
> the error message
> "Attribute value omitted for a non-boolean attribute. (HTML4-only error.)"
> is fundamentally wrong.

So please go ahead an open up a bug for that. We should probably just be suppressing that error message completely in this case, since the validator does actually also report "Attribute right not allowed on element p at this point." (which is the right message from users to get in this case, regardless of what the SGML spec says).

> There is no HTML4-only error here; on the contrary, the
> construct is valid HTML 4

Yeah, sure. In theory. In practice, the Web user agents that real-world users actually care about never implemented support for that misfeature of HTML4+SGML, so those users don't really care that the HTML4 spec claims it's valid.

> An appropriate message would be a warning based on
> http://www.w3.org/TR/html401/appendix/notes.html#h-B.3.3
> and could say e.g.
> "Minimized attribute syntax generally not supported by browsers."

We're not going to do that, however pedantically correct it might be.

> For <p><table ...>, an appropriate warning would be based on
> http://www.w3.org/TR/html401/struct/text.html#edef-P
> and could say e.g.
> "Empty P elements should not be used."

As you already know, when you use a Web-savvy parser with a doctype that triggers quirks mode, you're not going to get an empty p element there. If you instead want Web-ignorant parsing, just use the existing W3C validator and try to pretend that validator.nu doesn't exist.

Deploying any tool that just blindly adheres to all HTML4 conformance constraints is simple-minded. No real user publishing content on the Web wants a tool that does that. Do you think, for example, that actual users want to be told that id=FOO and id=foo are duplicate IDs (which as you know is how the HTML4 DTD requires them to be treated). Is that actually helpful to users?