NOTE: The current preferred location for bug reports is the GitHub issue tracker.
Bug 1013 - JSON output contains all data twice, doesn't match documentation
JSON output contains all data twice, doesn't match documentation
Status: RESOLVED NOTREPRODUCIBLE
Product: Validator.nu
Classification: Unclassified
Component: General
HEAD
All All
: P2 normal
Assigned To: Nobody
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-12-27 20:36 CET by Steve Steiner
Modified: 2014-12-28 15:20 CET (History)
1 user (show)

See Also:


Attachments
Screen shot of JSON schema (102.38 KB, image/png)
2014-12-27 20:36 CET, Steve Steiner
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Steve Steiner 2014-12-27 20:36:16 CET
Created attachment 232 [details]
Screen shot of JSON schema

The JSON returned by the validator does not match the documentation.  Not too surprising since, other than a squash of some SPAM a little while ago, it hasn't been updated in years.

The bigger issue is that there is a "content" field which contains a fully escaped copy of all of the data later included in the 'data' element.

This more than doubles the size of the reply and is completely unnecessary.

Also, the "source" element only contains 'encoding' and 'type' not the full source of the document as documented.

Just dropping the huge extra payload would be a huge bonus...

Steve
Comment 1 Michael[tm] Smith 2014-12-28 12:30:09 CET
Thanks for the report.

In the attachment at https://bugzilla.validator.nu/attachment.cgi?id=232 that you provided, I notice x-w3c-validator-* headers.

Those headers are not ever generated by the validator.nu backend so it seems to me you're maybe not using http://validator.w3.org/nu/ or https://validator.nu/ or https://html5.validator.nu/ directly but are instead using the legacy W3C validator service at http://validator.w3.org/

If so, the problems you describe are not caused by the validator.nu code but instead must be caused by the legacy W3C validator service.

So please instead just try using either https://validator.nu/ or https://html5.validator.nu/ or http://validator.w3.org/nu/ directly.

When I look at
https://validator.nu/?showsource=yes&doc=http://newsite.websaucesoftware.com/&out=json and http://validator.w3.org/nu/?showsource=yes&doc=http://newsite.websaucesoftware.com/&out=json with both services I get the JSON output I'd expect.

In particular, the JSON output as expected does have a "source" field that contains a "code" field with the HTML source of the document.

And the JSON output as expected does not have a "data" field at all, nor a "content" field.
Comment 2 Steve Steiner 2014-12-28 15:11:21 CET
Yes, this was the return value from the w3c validator.  Since the document's tagged as <html> I assumed this would just be a passthrough to validator.nu.

I will be calling validator.nu directly in production, so I guess this isn't a validator.nu bug report, per se.

Can't imagine why the old validator would mess with the validator.nu output, though...

Steve
Comment 3 Michael[tm] Smith 2014-12-28 15:20:22 CET
(In reply to Steve Steiner from comment #2)
> Yes, this was the return value from the w3c validator.  Since the document's
> tagged as <html> I assumed this would just be a passthrough to validator.nu.

Yeah unfortunately it monkeys with the output somehow. But fwiw I'm working on a way to make the legacy W3C validator basically just redirect to http://validator.w3.org/nu/ any time you feed it a doc with a modern <!doctype html> doctype. I hope to have that deployed some time within the next few weeks. That would avoid problems like the one you ran into.

> I will be calling validator.nu directly in production, so I guess this isn't
> a validator.nu bug report, per se.
> 
> Can't imagine why the old validator would mess with the validator.nu output,
> though...

Me neither. I've never really worked on the legacy validator code and don't really know what it might be doing, except that basically it makes a POST to the http://validator.w3.org/nu/ backend through the REST API that backend provides, then gets back a response and does some post-processing on that.

But once I deploy what I described above, it wouldn't do request/response thing from the REST API but would instead essentially redirect the whole request to http://validator.w3.org/nu/ directly, and so the response would then come back from http://validator.w3.org/nu/ directly, with no post-processing.