[wp-polyglots] Seeking Maintainers

Morgan Doocy morgan at doocy.net
Mon Feb 21 23:10:57 GMT 2005


On Feb 21, 2005, at 2:09 PM, Ryan Boren wrote:
> Something becoming more common is to append the script code to the
> language and country codes.  This is often done with Serbian to
> distinguish between Cyrillic and Latin scripts.  sr_CS is Cyrillic and
> sr_CS at Latn is Latin.

Good to know. The IANA language tags page lists sr-Cyrl and sr-Latn for 
that example. [1] It seems to me, however, that using sr_CS at Cyrl and 
sr_CS at Latn has more granularity, since it specifies the country (and 
therefore dialect?) as well.

[1] http://www.iana.org/assignments/language-tags

> I haven't seen this used with Chinese locales, however.  zh_CN, zh_TW,
> and zh_HK are still widely used.  Perhaps dialect implies script in
> these cases?  zh_CN is usually simplified Han (Hans) and zh_TW is
> usually traditional Han (Hant), yes?  Is zh_CN at Hant, for example, a 
> real
> world situation or merely theoretical?

I did a test a while back with Firefox and Google, and found that zh 
and zh-CN came back Simplified; and zh-TW, zh-HK and zh-SG all came 
back Traditional. I don't know how well this coincides with either 
official designations or colloquial use, but it was a good test at the 
time. My guess would be that CN is the only country actively using 
Simplified right now, and that zh-TW, zh-HK and zh-SG can all be 
"mapped" to Traditional. It would be nice if Jeffrey could confirm this 
though.

There's something else to consider though, I think: different dialects, 
like Cantonese, Mandarin and Jin, are all grouped together in zh, but 
their usage demographics don't coincide with country borders. This 
makes me think that, for better or for worse, the limit of granularity 
we can achieve with zh is country code (and, if necessary, script). 
Which is unfortunate, because that means there are variations in the 
specificity amongst language code–country code combinations.

Actually, technically each of what I've been referring to as "dialects" 
are full-blown Sinetic languages, each with somewhere between 5 and 500 
dialects. So maybe this is too big to hope to achieve accurate 
granularity with, and we should be content with just using language & 
country codes. :-)

> For reference, four letter script codes:
>
> http://www.unicode.org/iso15924/iso15924-codes.html

Oooooh, you don't know how useful that page is to me. Thank you so 
much. (I knew what Hans and Hant meant, but wasn't sure where those 
tags came from or if they were standardized in any other place but the 
IANA language tags list. Now I know.)

Morgan



More information about the wp-polyglots mailing list