Bugzilla – Bug 876
Incorrect validation of prefix subtags for extended language subtags and variant subtags
Last modified: 2011-11-08 19:30:43 CET
ISSUE: BCP47 says that the preferred prefix subtags are only RECOMMENDED/SHOULDs, they are not MUSTs. Therefore it is an error of validator.nu to report "Error: Bad value no-hognorsk for attribute lang on element p: Variant hognorsk lacks required prefix.". EXAMPLE: For <p lang="no-hognorsk">, the validator.nu reports:"Variant hognorsk lacks required prefix." The (recommended) prefix of "hognorsk" is "nn", and it seems correct to inform author that he/she uses a prefix that has no "blessing". However, it is incorrect to state that the prefix is required. You might also check Richard Ishida's language tag checking tool - it gives a message that is in line with what I think Validator.nu should do as well: http://rishida.net/utils/subtags/index.php?check=no-hognorsk&submit=Check This is BCP47's relevant section on the prefix field: http://tools.ietf.org/html/rfc5646#section-3.1.8 ''' 3.1.8. Prefix Field The field 'Prefix' contains a valid language tag that is RECOMMENDED as one possible prefix to this record's subtag, perhaps with other subtags. That is, when including an extended language or a variant subtag that has at least one 'Prefix' in a language tag, the resulting tag SHOULD match at least one of the subtag's 'Prefix' fields using the "Extended Filtering" algorithm (see [RFC4647]), and each of the subtags in that 'Prefix' SHOULD appear before the subtag itself. ''''
When RFCs are referenced by the HTML spec, it is often unclear whether it makes sense for conformance to mean MUST compliance only of MUST and SHOULD compliance. The definition of SHOULD implies that good reasons to violate SHOULDs are rare. That's why it makes sense to err on the side of reporting SHOULD violations. If spec writers intend their SHOULDs to be violated but sprinkle SHOULDs around anyway as a theoretical purity measure, I think the complaints should be addressed to the spec writers.
(In reply to comment #1) > When RFCs are referenced by the HTML spec, it is often unclear whether it makes > sense for conformance to mean MUST compliance only of MUST and SHOULD > compliance. If @lang MUST comply with BCP47, then lang="no-hognorsk" breaks a RECOMMENDATION - it does not break a REQUIRED. > The definition of SHOULD implies that good reasons to violate SHOULDs are rare. > That's why it makes sense to err on the side of reporting SHOULD violations. If > spec writers intend their SHOULDs to be violated but sprinkle SHOULDs around > anyway as a theoretical purity measure, I think the complaints should be > addressed to the spec writers. It is those who object to BCP47 that should take that up with the BCP47 editors etc. Meanwhile, I dont quite understand what you mean by "theoretical purity". Against what? E.g. there are many cases where "no-hognorsk" would probably work just fine. FYI and FWIW: I took part in the discussion of the registration of "hognorsk". One of the reasons why I accepted "nn" as the sole prefix was because I was assured that "no-hognorsk" is permitted even if it has no recommendation. If you mark it as error then at least use the correct language about it. That is, if the error messages would say eg: "Error: Variant foo lacks the recommended prefix". or: "Error: Variant foo lacks the/a preferred prefix". or: "Error: Variant foo lacks an intended prefix". then I would find it acceptable. Richard Ishida's language tag checking tool already uses that approach: his tool uses the same symbol - a red circle with a x inside - regardless of whether the tag breaks a RECOMMENDED or a REQUIRED. However, it uses the correct language in each case - should or must. That the different language tag checking tools converge, ought to be a goal in itself.
Actually, I would prefer that it is marked with the yellow color as it only breaks a recommendation, and thus is not an error. However, I am willig to accept the same solution as RIchard Ishida has: red color but correct language.
(In reply to comment #2) > That the > different language tag checking tools converge, ought to be a goal in itself. The tools are for different purposes. Richard's tool isn't intended for the specific purpose of checking the value of lang attributes in HTML documents.
(In reply to comment #4) > (In reply to comment #2) > > That the > > different language tag checking tools converge, ought to be a goal in itself. > > The tools are for different purposes. Richard's tool isn't intended for the > specific purpose of checking the value of lang attributes in HTML documents. Please explain how that is relevant. BCP47 is fully aware of XML and HTML. It mentions both several places.
(In reply to comment #4) > The tools are for different purposes. Richard's tool isn't intended for the > specific purpose of checking the value of lang attributes in HTML documents. Regardless of intensions, for the tag "no-hognorsk", Richard's tool behaves just like Validator.nu does w.r.t. display of red color. However with the important nuance that Richard's tool uses correct language - should and not must. Validator.nu however is not consistent w.r.t. to treating errors that breaks a BCP47 RECOMMENDED/SHOULD. For example, let us take the tag "zh-cmn". There is two ways of reading that tag: 1) As language subtag "zh" + extlang subtag "cmn" - which equivalent to 'cmn' 2) As the redundant *tag* "zh-cmn", in which case the RECOMMENDED value is simply "cmn" For some reason, Validator.nu has chosen to read it as 2). This is probably incorrect. It should rather read it as 1) and give the same message that Richard's tool gives. See: http://rishida.net/utils/subtags/index.php?check=zh-cmn&submit=Check That is: no error message at all (eventually with som information that it is possible to drop the 'zh' prefix) However, Validator.nu has chosen to interpret it as 2). And, from that point of view, correctly informas that the tag "zh-cmn" is deprecated. However, BCP47 says that the entry's preferred-value field (in this case 'cmn") then is RECOMMENDED. So how come Validator.nu does not display an full error? I think that this inconsistency and the seemingly erroneous validation of "zh-cmn" indicates that it it would serve the HTML5 validator well to be very accurate with the choice of error message language: Do not step outside what BCP47 says.
(In reply to comment #6) > Validator.nu however is not consistent w.r.t. to treating errors that breaks a > BCP47 RECOMMENDED/SHOULD. > > For example, let us take the tag "zh-cmn". T A better/simpler example is 'no-nyn', see bug 878