[wp-polyglots] Seeking Maintainers
Morgan Doocy
morgan at doocy.net
Mon Feb 21 23:10:57 GMT 2005
On Feb 21, 2005, at 2:09 PM, Ryan Boren wrote:
> Something becoming more common is to append the script code to the
> language and country codes. This is often done with Serbian to
> distinguish between Cyrillic and Latin scripts. sr_CS is Cyrillic and
> sr_CS at Latn is Latin.
Good to know. The IANA language tags page lists sr-Cyrl and sr-Latn for
that example. [1] It seems to me, however, that using sr_CS at Cyrl and
sr_CS at Latn has more granularity, since it specifies the country (and
therefore dialect?) as well.
[1] http://www.iana.org/assignments/language-tags
> I haven't seen this used with Chinese locales, however. zh_CN, zh_TW,
> and zh_HK are still widely used. Perhaps dialect implies script in
> these cases? zh_CN is usually simplified Han (Hans) and zh_TW is
> usually traditional Han (Hant), yes? Is zh_CN at Hant, for example, a
> real
> world situation or merely theoretical?
I did a test a while back with Firefox and Google, and found that zh
and zh-CN came back Simplified; and zh-TW, zh-HK and zh-SG all came
back Traditional. I don't know how well this coincides with either
official designations or colloquial use, but it was a good test at the
time. My guess would be that CN is the only country actively using
Simplified right now, and that zh-TW, zh-HK and zh-SG can all be
"mapped" to Traditional. It would be nice if Jeffrey could confirm this
though.
There's something else to consider though, I think: different dialects,
like Cantonese, Mandarin and Jin, are all grouped together in zh, but
their usage demographics don't coincide with country borders. This
makes me think that, for better or for worse, the limit of granularity
we can achieve with zh is country code (and, if necessary, script).
Which is unfortunate, because that means there are variations in the
specificity amongst language code–country code combinations.
Actually, technically each of what I've been referring to as "dialects"
are full-blown Sinetic languages, each with somewhere between 5 and 500
dialects. So maybe this is too big to hope to achieve accurate
granularity with, and we should be content with just using language &
country codes. :-)
> For reference, four letter script codes:
>
> http://www.unicode.org/iso15924/iso15924-codes.html
Oooooh, you don't know how useful that page is to me. Thank you so
much. (I knew what Hans and Hant meant, but wasn't sure where those
tags came from or if they were standardized in any other place but the
IANA language tags list. Now I know.)
Morgan
More information about the wp-polyglots
mailing list