NOTE: The current preferred location for bug reports is the GitHub issue tracker.
Bug 876 - Incorrect validation of prefix subtags for extended language subtags and variant subtags
Incorrect validation of prefix subtags for extended language subtags and vari...
Status: REOPENED
Product: Validator.nu
Classification: Unclassified
Component: General
HEAD
All All
: P2 normal
Assigned To: Nobody
http://tools.ietf.org/html/rfc5646#se...
Depends on:
Blocks: 878
  Show dependency treegraph
 
Reported: 2011-11-07 23:36 CET by Leif Halvard Silli
Modified: 2011-11-08 19:30 CET (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Leif Halvard Silli 2011-11-07 23:36:46 CET
ISSUE: BCP47 says that  the preferred prefix subtags are only RECOMMENDED/SHOULDs, they are not MUSTs. Therefore it is an error of validator.nu to report "Error: Bad value no-hognorsk for attribute lang  on element p: Variant hognorsk lacks required prefix.".

EXAMPLE: For <p lang="no-hognorsk">,  the validator.nu reports:"Variant hognorsk lacks required prefix." 

The (recommended) prefix of "hognorsk" is "nn", and it seems correct to inform author that he/she uses a prefix that has no "blessing". However, it is incorrect to state that the prefix is required.

You might also check Richard Ishida's language tag checking tool - it gives a message that is in line with what I think Validator.nu should do as well:

http://rishida.net/utils/subtags/index.php?check=no-hognorsk&submit=Check


This is BCP47's relevant section on the prefix field: http://tools.ietf.org/html/rfc5646#section-3.1.8

'''
3.1.8.  Prefix Field

   The field 'Prefix' contains a valid language tag that is RECOMMENDED
   as one possible prefix to this record's subtag, perhaps with other
   subtags.  That is, when including an extended language or a variant
   subtag that has at least one 'Prefix' in a language tag, the
   resulting tag SHOULD match at least one of the subtag's 'Prefix'
   fields using the "Extended Filtering" algorithm (see [RFC4647]), and
   each of the subtags in that 'Prefix' SHOULD appear before the subtag
   itself.
''''
Comment 1 Henri Sivonen 2011-11-08 09:43:24 CET
When RFCs are referenced by the HTML spec, it is often unclear whether it makes sense for conformance to mean MUST compliance only of MUST and SHOULD compliance.

The definition of SHOULD implies that good reasons to violate SHOULDs are rare. That's why it makes sense to err on the side of reporting SHOULD violations. If spec writers intend their SHOULDs to be violated but sprinkle SHOULDs around anyway as a theoretical purity measure, I think the complaints should be addressed to the spec writers.
Comment 2 Leif Halvard Silli 2011-11-08 15:42:35 CET
(In reply to comment #1)
> When RFCs are referenced by the HTML spec, it is often unclear whether it makes
> sense for conformance to mean MUST compliance only of MUST and SHOULD
> compliance.

If @lang MUST comply with BCP47, then lang="no-hognorsk" breaks a RECOMMENDATION - it does not break a REQUIRED.

> The definition of SHOULD implies that good reasons to violate SHOULDs are rare.
> That's why it makes sense to err on the side of reporting SHOULD violations. If
> spec writers intend their SHOULDs to be violated but sprinkle SHOULDs around
> anyway as a theoretical purity measure, I think the complaints should be
> addressed to the spec writers.

It is those who object to BCP47 that should take that up with the BCP47 editors etc. Meanwhile, I dont quite understand what you mean by "theoretical purity". Against what? E.g. there are many cases where "no-hognorsk" would probably work just fine.

FYI and FWIW: I took part in the discussion of the registration of "hognorsk". One of the reasons why I accepted "nn" as the sole prefix was because I was assured that "no-hognorsk" is permitted even if it has no recommendation.

If you mark it as error then at least use the correct language about it. That is, if the error messages would say

eg: "Error: Variant foo lacks the recommended prefix". 
or: "Error: Variant foo lacks the/a preferred prefix". 
or: "Error: Variant foo lacks an intended prefix".

then I would find it acceptable. Richard Ishida's language tag checking tool already uses that approach: his tool uses the same symbol - a red circle with a x inside - regardless of whether the tag breaks a RECOMMENDED or a REQUIRED. However, it uses the correct language in each case - should or must. That the different language tag checking tools converge, ought to be a goal in itself.
Comment 3 Leif Halvard Silli 2011-11-08 15:47:08 CET
Actually, I would prefer that it is marked with the yellow color as it only breaks a recommendation, and thus is not an error. 

However, I am willig to accept the same solution as RIchard Ishida has: red color but correct language.
Comment 4 Michael[tm] Smith 2011-11-08 15:50:02 CET
(In reply to comment #2)
> That the
> different language tag checking tools converge, ought to be a goal in itself.

The tools are for different purposes. Richard's tool isn't intended for the specific purpose of checking the value of lang attributes in HTML documents.
Comment 5 Leif Halvard Silli 2011-11-08 15:53:36 CET
(In reply to comment #4)
> (In reply to comment #2)
> > That the
> > different language tag checking tools converge, ought to be a goal in itself.
> 
> The tools are for different purposes. Richard's tool isn't intended for the
> specific purpose of checking the value of lang attributes in HTML documents.

Please explain how that is relevant. 

BCP47 is fully aware of XML and HTML. It mentions both several places.
Comment 6 Leif Halvard Silli 2011-11-08 18:07:06 CET
(In reply to comment #4)

> The tools are for different purposes. Richard's tool isn't intended for the
> specific purpose of checking the value of lang attributes in HTML documents.

Regardless of intensions, for the tag "no-hognorsk", Richard's tool behaves  just like Validator.nu does w.r.t.  display of red color. However with the important nuance  that Richard's tool uses correct language - should and not must.

Validator.nu however is not consistent w.r.t. to treating errors that breaks a BCP47 RECOMMENDED/SHOULD. 

For example, let us take the tag "zh-cmn". There is two ways of reading that tag:

 1) As language subtag "zh" + extlang subtag "cmn" - which equivalent to 'cmn'
 2) As the redundant *tag* "zh-cmn", in which case the RECOMMENDED value is simply "cmn"

For some reason, Validator.nu has chosen to read it as 2). This is probably incorrect. It should rather read it as 1) and give the same message that Richard's tool gives. 

See: http://rishida.net/utils/subtags/index.php?check=zh-cmn&submit=Check

That is: no error message at all (eventually with som information that it is possible to drop the 'zh' prefix) However, Validator.nu has chosen to interpret it as 2). And, from that point of view, correctly informas that the tag "zh-cmn" is deprecated. However, BCP47 says that the entry's preferred-value field (in this case 'cmn")  then is RECOMMENDED. So how come Validator.nu does not display an full error?

I think that this inconsistency and the seemingly erroneous validation of "zh-cmn" indicates that it it would serve the HTML5 validator well to be very accurate with the choice of error message language: Do not step outside what BCP47 says.
Comment 7 Leif Halvard Silli 2011-11-08 19:30:43 CET
(In reply to comment #6)

> Validator.nu however is not consistent w.r.t. to treating errors that breaks a
> BCP47 RECOMMENDED/SHOULD. 
> 
> For example, let us take the tag "zh-cmn". T

A better/simpler example is 'no-nyn', see bug 878