Bug 482 – consider including content-model, etc., informational stuff in GNU output (not just in HTML output)

NOTE: The current preferred location for bug reports is the GitHub issue tracker.

Bug 482 - consider including content-model, etc., informational stuff in GNU output (not just in HTML output)


Summary:	consider including content-model, etc., informational stuff in GNU output (no...

Status:	RESOLVED INTENTIONAL

Product:	Validator.nu
Classification:	Unclassified
Component:	Web service formats
Version:	HEAD
Hardware:	All All

Importance:	P2 enhancement
Assigned To:	Nobody

URL:

Duplicates:	483 484 (view as bug list)
Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2009-04-15 10:26 CEST by Michael[tm] Smith
Modified:	2013-07-12 05:26 CEST (History)
CC List:	0 users

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Michael[tm] Smith 2009-04-15 10:26:55 CEST

About the content-model/contexts/attributes details that v.nu scrapes from the spec and includes in particular error messages -- it would be useful to have that information available in GNU output as well (though I recognize it'd be some significant work to implement).

Comment 1 Michael[tm] Smith 2009-04-15 11:59:52 CEST

*** Bug 483 has been marked as a duplicate of this bug. ***

Comment 2 Michael[tm] Smith 2009-04-15 12:00:11 CEST

*** Bug 484 has been marked as a duplicate of this bug. ***

Comment 3 Henri Sivonen 2009-04-16 16:03:37 CEST

Any suggestions on how to convert the HTML fragments into plain text and how to prefix the lines in the GNU format?

Comment 4 Michael[tm] Smith 2009-04-17 10:03:49 CEST

(In reply to comment #3)
> Any suggestions on how to convert the HTML fragments into plain text and how to
> prefix the lines in the GNU format?

As far as how to convert the HTML fragments, suppose the easiest and quickest thing to do would be to write a custom converter/serializer to handle just the subset of HTML element names that are used in the spec fragments -- which seems basically just to be <dl>, <dt>, <dd>, <a>, and <code>.

I think the GNU-format converter/serializer wouldn't need to do anything at all with <code>, nor with  <a>.  For <dt> content, which ends with a colon character to introduce the list items, I think because of the fact the colon is used as a field separator in GNU format, it would need to convert the colon to something else. Maybe just a space and dash?

I think the rest of it would come down to just normalizing the line breaks in the HTML fragments into single spaces, generating a comma character after contents of each <dd>, and then generating a period after the contents of the <dl> (closing tag). And then, preferably, emitting it as a separate "info" message instead of as part of the error (see the bug 485).

So for the following case:

http://dev.w3.org/html5/tests/validation/full/invalid/unknown-attribute/link.html

...the output would be:

"link.html":5.1-5.44: error: Attribute â€œbarâ€ not allowed on element â€œlinkâ€ at this point.
"link.html":5.1-5.44: info: Element-specific attributes for element link - Global attributes, href, rel, media, hreflang, type, sizes. Also, the title attribute has special semantics on this element.