NOTE: The current preferred location for bug reports is the GitHub issue tracker.
Bug 65 - Show document outline
Show document outline
Status: RESOLVED FIXED
Product: Validator.nu
Classification: Unclassified
Component: Controller
HEAD
All All
: P2 enhancement
Assigned To: Michael[tm] Smith
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-02-24 19:00 CET by Henri Sivonen
Modified: 2013-04-24 19:25 CEST (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Henri Sivonen 2008-02-24 19:00:27 CET
Show the outline of (X)HTML documents so that a human can verify that it makes sense.

Pending Hixie's reformulation of the outline algorithm.
Comment 1 Michael[tm] Smith 2012-11-18 17:53:46 CET
Got a patch from a contributor. Working on integrating it now.
Comment 2 Henri Sivonen 2012-11-26 11:21:48 CET
I forgot to ask: If this is client-side JS, how does the client-side JS get a document tree to work with?
Comment 3 Michael[tm] Smith 2012-11-26 16:39:52 CET
(In reply to comment #2)
> I forgot to ask: If this is client-side JS, how does the client-side JS get a
> document tree to work with?

I did that by feeding the textarea value to DOMParser.parseFromString.

But I've abandoned the client-side thing. It was just a kludge. I have it implemented completely on the server side now. You can test it now at http://qa-dev.w3.org:8888/

There were a couple of bugs in the contributed code that for a couple of cases caused it an outline that didn't conform to the spec (and that didn't match output from other implementations), but I have already debugged those and fixed them. So it now passes every test I've thrown at it. 

The code contributions I received (MIT-licensed) were for a class that builds an outline object, and another class that emits an HTML representation from that outline object. I integrated the outline-builder code by putting it into an additional reader that gets called on parsing of the document source, and I integrated the HTML emitter into the servlet code so that it gets called only if the output format is HTML. I did that without touching any of the MessageEmitter code, because that's really not necessary for the HTML case.

I think it might be good  to also make the outline available for JSON output and XML output, but I haven't written successful code to do that. The JSON and XML outline emitters would need to be hooked in to the MessageEmitter code, and I messed around a bit with try to create some and hook them in there, but I get failures from further away in the nu.validator.json.JsonHandler code and XML serializer code. I guess I'd need to get more familiar with that code before I could fix the problems.

Anyway, I have ready for review a patch for the outline builder and HTML emitter, and can get it to you any time. Just didn't want to pile it on you.
Comment 4 Michael[tm] Smith 2012-12-03 06:57:45 CET
Another thing I should mention: The way I implemented this, it requires some (minor) changes to the HTML parser code and also to the XML parser code (Aelfred2 SAXDriver).

The reason Is, it stores and retrieves the outline using the setProperty and getProperty methods of the reader. So those need to be made to recognize the property name for the outline (for which I just have the code using "http://validator.nu/properties/document-outline").

That's the only way -- without also turning the outline builder into an output-format-neutral outline emitter (which would not make sense) -- that I could see to pass the outline from the step when it's built (by the additional reader wrapper) and the step when it needs to be rendered. So the backend handling is different from that for the sorta similar "Show source" case -- but that's kind of expected because the source emitter is not format-specific (that is, for HTML vs JSON vs XML output, the source output is all the same), while the outline emitter needs to be format specific (different between HTML and JSON output especially).
Comment 5 Henri Sivonen 2012-12-04 14:22:47 CET
That’s weird. Why not have an outline builder as a ContentHandler outside the parsers?
Comment 6 Michael[tm] Smith 2012-12-04 15:00:02 CET
(In reply to comment #5)
> That’s weird. Why not have an outline builder as a ContentHandler outside the
> parsers?

Maybe just because I wasn't sure how to hook it into the existing VerifierServletTransaction code as a ContentHandler and have that actually read the document input. Is there existing example of another filter class that's hooked into the VerifierServletTransaction code as a ContentHandler?
Comment 7 Henri Sivonen 2012-12-04 16:54:04 CET
See the way baseUriTracker is set up with CombineContentHandler for the pattern that doesn’t require parser changes and works with both parsers.

            if (baseUriTracker == null) {
                wiretap.setWiretapContentHander(recorder);
            } else {
                wiretap.setWiretapContentHander(new CombineContentHandler(
                        recorder, baseUriTracker));
            }
Comment 8 Michael[tm] Smith 2012-12-08 07:01:43 CET
(In reply to comment #7)
> See the way baseUriTracker is set up with CombineContentHandler for the pattern
> that doesn’t require parser changes and works with both parsers.

So I've looked now at that code but it seems like more than we need for the outline case.

It has since occurred to me that the outline could be stored as a property of the request rather than the reader. So that's what I've switched it to -- request.setAttribute("http://validator.nu/properties/document-outline", (Deque<Section>) currentOutlinee.outline) to store it and then to get it back, outline = (Deque<Section>) request.getAttribute("http://validator.nu/properties/document-outline").

So no changes to the parsers needed.