NOTE: The current preferred location for bug reports is the GitHub issue tracker.
Bug 877 - Treatment of language tags/subtags of type Redundant
Treatment of language tags/subtags of type Redundant
Status: NEW
Product: Validator.nu
Classification: Unclassified
Component: General
HEAD
All All
: P2 normal
Assigned To: Nobody
http://tools.ietf.org/html/rfc5646#se...
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-11-08 18:33 CET by Leif Halvard Silli
Modified: 2011-11-08 20:15 CET (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Leif Halvard Silli 2011-11-08 18:33:59 CET
BCP47 section 2.2.8:

''''
   A redundant tag is a grandfathered
   registration whose individual subtags appear with the same semantic
   meaning in the registry.  For example, the tag "zh-Hant" (Traditional
   Chinese) can now be composed from the subtags 'zh' (Chinese) and
   'Hant' (Han script traditional variant).  These redundant tags are
   maintained in the registry as records of type 'redundant', mostly as
   a matter of historical curiosity.
'''

Example - validation of the tag "zh-cmn":
http://validator.nu/?doc=data%3Atext%2Fhtml%3Bbase64%2CPCFET0NUWVBFIGh0bWw%252BPGh0bWwgbGFuZz16aC1jbW4gPjxoZWFkPjx0aXRsZT5DaGluZXNlLCBNYW5kYXJpbjwvdGl0bGU%252BPC9oZWFkPjxib2R5PjwvYm9keT48L2h0bWw%252BDQo%253D&showsource=yes

Expected result: No error message at all. See Richard's tool: 
http://rishida.net/utils/subtags/index.php?check=zh-cmn&submit=Check

Actual result:  Validator.nu gives the message "The language tag zh-cmn is deprecated. Use cmn instead."

Comment:  It does not make sense to assume that  "zh-cmn" is meant to be read like a single subtag. It has the form of  two subtags - 'zh' and 'cmn'.  As BCP47 says, 'zh-cmn' remains in the subtag registry mostly as a matter of historical curiosity.

For the record, there are 3 entries in the subtag registry where 'cmn' appears:

1)
Type: language
Subtag: cmn
[snip]

2)
Type: extlang
Subtag: cmn
Description: Mandarin Chinese
Added: 2009-07-29
Preferred-Value: cmn
Prefix: zh
Macrolanguage: zh

3)
Type: redundant 
Tag: zh-cmn
[snip]

And the validator should assume that "zh-cmn" implies the second option.
Comment 1 Michael[tm] Smith 2011-11-08 18:59:20 CET
(In reply to comment #0)
> Expected result: No error message at all. See Richard's tool: 
> http://rishida.net/utils/subtags/index.php?check=zh-cmn&submit=Check

Rather than "No error message at all", what I see at that URL is a message saying, "It is usually better to just use the cmn language subtag, rather than zh-cmn." -- which essentially means the same thing as the "The language tag zh-cmn is deprecated. Use cmn instead." message that validator.nu emits.
Comment 2 Leif Halvard Silli 2011-11-08 19:59:15 CET
(In reply to comment #1)
> (In reply to comment #0)
> > Expected result: No error message at all. See Richard's tool: 
> > http://rishida.net/utils/subtags/index.php?check=zh-cmn&submit=Check
> 
> Rather than "No error message at all", what I see at that URL is a message
> saying, "It is usually better to just use the cmn language subtag, rather than
> zh-cmn." -- which essentially means the same thing as the "The language tag
> zh-cmn is deprecated. Use cmn instead." message that validator.nu emits.

Language tag validation is a complicated issue - that is what you see, I hope.

I don't know what to say, except that you are wrong.

I have nothing against it if validator.nu would emit the exact same message that Richard's tool emit. So if Validator.nu would do that, then the case is closed.

However, it needs to be clear that Richard's tool and Validator.nu do not validate the same thing - it only looks the same.

Rememeber the quote from BCP47 above, which says that redundant tags are actually a special kind of grandfathered tags. Grandfathered tags are not permitted to have further subtags. So for instance the grandfathered tag "no-nyn" - even if it is equivalent to "nn", it is not allowed to have e.g. a region subtag such as "NO".

So, while you can do "nn-NO",
you cannot do "no-nyn-NO".

(Jus try Richard's tool: http://rishida.net/utils/subtags/index.php?check=no-nyn-no&submit=Check)

In a similar way, if you choose to see "zh-cmn" as a grandfathered, redundant tag, then you can't add the region subtag "cn" - thus you would not be able to add "zh-cmn-cn". 

However, as a matter of fact, Validator does accept "zh-cmn-cn", despite that it sees "zh-cmn" as deprecated ... How logical is that?

http://validator.nu/?doc=data%3Atext%2Fhtml%3Bcharset%3Dutf-8%3Bbase64%2CPCFET0NUWVBFIGh0bWw%252BPGh0bWwgbGFuZz0iemgtY21uLWNuIiA%252BPHRpdGxlPjwvdGl0bGU%252B&showsource=yes

(And Richard's tool agrees: http://rishida.net/utils/subtags/index.php?check=zh-cmn-cn&submit=Check )

In a summary, this means Validator.nu issues as warning for "zh-cmn". But issues zero warnings for "zh-cmn-cn" ...  A fact that I hope demonstrates to you how broken Validator.nu's language tag validation is.
Comment 3 Leif Halvard Silli 2011-11-08 20:15:44 CET
(In reply to comment #2)

> Rememeber the quote from BCP47 above, which says that redundant tags are
> actually a special kind of grandfathered tags. Grandfathered tags are not
> permitted to have further subtags. 

Just to justify this with a quote from BCP47:

'''
   Prior to RFC 4646, whole language tags were registered according to
   the rules in RFC 1766 and/or RFC 3066.  All of these registered tags
   remain valid as language tags.
'''

This goes back to the distinction between "subtag" and "tag".

Thus, "no-nyn" and "zh-cmn" was once registered as _whole_ (that is: complete) language tags. Considered as complete language tags, they are not modular, like normal language tags are: You can thus not add additional subtags to them. 

For example, 'i-default' is a grandfathered, but not deprecated, complete, language tag. If you try to add a region subtag to it, however, then it doesn't validate:

http://rishida.net/utils/subtags/index.php?check=i-default-de&submit=Check