Bugzilla – Bug 877
Treatment of language tags/subtags of type Redundant
Last modified: 2011-11-08 20:15:44 CET
BCP47 section 2.2.8: '''' A redundant tag is a grandfathered registration whose individual subtags appear with the same semantic meaning in the registry. For example, the tag "zh-Hant" (Traditional Chinese) can now be composed from the subtags 'zh' (Chinese) and 'Hant' (Han script traditional variant). These redundant tags are maintained in the registry as records of type 'redundant', mostly as a matter of historical curiosity. ''' Example - validation of the tag "zh-cmn": http://validator.nu/?doc=data%3Atext%2Fhtml%3Bbase64%2CPCFET0NUWVBFIGh0bWw%252BPGh0bWwgbGFuZz16aC1jbW4gPjxoZWFkPjx0aXRsZT5DaGluZXNlLCBNYW5kYXJpbjwvdGl0bGU%252BPC9oZWFkPjxib2R5PjwvYm9keT48L2h0bWw%252BDQo%253D&showsource=yes Expected result: No error message at all. See Richard's tool: http://rishida.net/utils/subtags/index.php?check=zh-cmn&submit=Check Actual result: Validator.nu gives the message "The language tag zh-cmn is deprecated. Use cmn instead." Comment: It does not make sense to assume that "zh-cmn" is meant to be read like a single subtag. It has the form of two subtags - 'zh' and 'cmn'. As BCP47 says, 'zh-cmn' remains in the subtag registry mostly as a matter of historical curiosity. For the record, there are 3 entries in the subtag registry where 'cmn' appears: 1) Type: language Subtag: cmn [snip] 2) Type: extlang Subtag: cmn Description: Mandarin Chinese Added: 2009-07-29 Preferred-Value: cmn Prefix: zh Macrolanguage: zh 3) Type: redundant Tag: zh-cmn [snip] And the validator should assume that "zh-cmn" implies the second option.
(In reply to comment #0) > Expected result: No error message at all. See Richard's tool: > http://rishida.net/utils/subtags/index.php?check=zh-cmn&submit=Check Rather than "No error message at all", what I see at that URL is a message saying, "It is usually better to just use the cmn language subtag, rather than zh-cmn." -- which essentially means the same thing as the "The language tag zh-cmn is deprecated. Use cmn instead." message that validator.nu emits.
(In reply to comment #1) > (In reply to comment #0) > > Expected result: No error message at all. See Richard's tool: > > http://rishida.net/utils/subtags/index.php?check=zh-cmn&submit=Check > > Rather than "No error message at all", what I see at that URL is a message > saying, "It is usually better to just use the cmn language subtag, rather than > zh-cmn." -- which essentially means the same thing as the "The language tag > zh-cmn is deprecated. Use cmn instead." message that validator.nu emits. Language tag validation is a complicated issue - that is what you see, I hope. I don't know what to say, except that you are wrong. I have nothing against it if validator.nu would emit the exact same message that Richard's tool emit. So if Validator.nu would do that, then the case is closed. However, it needs to be clear that Richard's tool and Validator.nu do not validate the same thing - it only looks the same. Rememeber the quote from BCP47 above, which says that redundant tags are actually a special kind of grandfathered tags. Grandfathered tags are not permitted to have further subtags. So for instance the grandfathered tag "no-nyn" - even if it is equivalent to "nn", it is not allowed to have e.g. a region subtag such as "NO". So, while you can do "nn-NO", you cannot do "no-nyn-NO". (Jus try Richard's tool: http://rishida.net/utils/subtags/index.php?check=no-nyn-no&submit=Check) In a similar way, if you choose to see "zh-cmn" as a grandfathered, redundant tag, then you can't add the region subtag "cn" - thus you would not be able to add "zh-cmn-cn". However, as a matter of fact, Validator does accept "zh-cmn-cn", despite that it sees "zh-cmn" as deprecated ... How logical is that? http://validator.nu/?doc=data%3Atext%2Fhtml%3Bcharset%3Dutf-8%3Bbase64%2CPCFET0NUWVBFIGh0bWw%252BPGh0bWwgbGFuZz0iemgtY21uLWNuIiA%252BPHRpdGxlPjwvdGl0bGU%252B&showsource=yes (And Richard's tool agrees: http://rishida.net/utils/subtags/index.php?check=zh-cmn-cn&submit=Check ) In a summary, this means Validator.nu issues as warning for "zh-cmn". But issues zero warnings for "zh-cmn-cn" ... A fact that I hope demonstrates to you how broken Validator.nu's language tag validation is.
(In reply to comment #2) > Rememeber the quote from BCP47 above, which says that redundant tags are > actually a special kind of grandfathered tags. Grandfathered tags are not > permitted to have further subtags. Just to justify this with a quote from BCP47: ''' Prior to RFC 4646, whole language tags were registered according to the rules in RFC 1766 and/or RFC 3066. All of these registered tags remain valid as language tags. ''' This goes back to the distinction between "subtag" and "tag". Thus, "no-nyn" and "zh-cmn" was once registered as _whole_ (that is: complete) language tags. Considered as complete language tags, they are not modular, like normal language tags are: You can thus not add additional subtags to them. For example, 'i-default' is a grandfathered, but not deprecated, complete, language tag. If you try to add a region subtag to it, however, then it doesn't validate: http://rishida.net/utils/subtags/index.php?check=i-default-de&submit=Check