[wp-polyglots] Seeking Maintainers

Jeffrey Tam jeffreytam at gmail.com
Tue Feb 22 01:36:42 GMT 2005


Hi Ryan and Morgan,

Glad to hear from you. Here's my opinion on the Chinese locale issues:

Different Chinese speaking countries (districts) have their own
systems of written languages, for example Simplifiled Chinese
(P.R.China, Singapore) and Traditional Chinese (Hong Kong, Taiwan,
Macau). Even for Simplified or Traditional, these standards have some
variants in expressing some phrases or proper nouns. Therefore,
Microsoft separated them in Windows XP (I am a Windows XP Tech
Support), and most open source software separate them into three. They
are zh_cn, zh_hk and zh-tw. And as I know, other countries that use
Simplified Chinese all tend to follow the standards used by the PRC
government, such as Singapore and Korea. For zh_hk and zh-tw, they may
have different expressions for computer hardware, so I suggest
separate them too.

As for the IANA standards, I believe that they are more suitable for
speaking languages of Chinese. In China, we have at least more than 50
different dialects, but we all follow the Simplified Chinese system
made by the government when reading and writing. Some dialects don't
have their written forms. However, only one standards among the five I
mentioned before is used in one country (district).

So, my suggestion is to define the translations based on the
administrative districts, either into 3 (PRC, HK, TW) or 5(PRC, SG,
HK, TW, Macau).

Regards,
Jeffrey Tam

On Mon, 21 Feb 2005 16:58:55 -0800, Morgan Doocy <morgan at doocy.net> wrote:
> Update on the Chinese locales: I didn't notice before, but the IANA
> language tags list has the following:
> 
> zh-gan          Kan or Gan
> zh-guoyu                Mandarin or Standard Chinese
> zh-hakka                Hakka
> zh-min          Min, Fuzhou, Hokkien, Amoy or Taiwanese
> zh-min-nan      Minnan, Hokkien, Amoy, Tiawanese, Southern Min,
>                        Southern Fujian, Hoklo, Southern Fukien, Ho-lo
> zh-wuu          Shanghaiese or Wu
> zh-xiang                Xiang or Hunanese
> zh-yue          Cantonese
> 
> So perhaps we should use the IANA tags. I'm not sure how we'd specify
> the script in addition to the "dialect," but at least it's more
> granular, and we could still probably map the dialects to scripts.
> 
> Morgan
> 
> On Feb 21, 2005, at 3:10 PM, Morgan Doocy wrote:
> 
> > On Feb 21, 2005, at 2:09 PM, Ryan Boren wrote:
> >> Something becoming more common is to append the script code to the
> >> language and country codes.  This is often done with Serbian to
> >> distinguish between Cyrillic and Latin scripts.  sr_CS is Cyrillic and
> >> sr_CS at Latn is Latin.
> >
> > Good to know. The IANA language tags page lists sr-Cyrl and sr-Latn
> > for that example. [1] It seems to me, however, that using sr_CS at Cyrl
> > and sr_CS at Latn has more granularity, since it specifies the country
> > (and therefore dialect?) as well.
> >
> > [1] http://www.iana.org/assignments/language-tags
> >
> >> I haven't seen this used with Chinese locales, however.  zh_CN, zh_TW,
> >> and zh_HK are still widely used.  Perhaps dialect implies script in
> >> these cases?  zh_CN is usually simplified Han (Hans) and zh_TW is
> >> usually traditional Han (Hant), yes?  Is zh_CN at Hant, for example, a
> >> real
> >> world situation or merely theoretical?
> >
> > I did a test a while back with Firefox and Google, and found that zh
> > and zh-CN came back Simplified; and zh-TW, zh-HK and zh-SG all came
> > back Traditional. I don't know how well this coincides with either
> > official designations or colloquial use, but it was a good test at the
> > time. My guess would be that CN is the only country actively using
> > Simplified right now, and that zh-TW, zh-HK and zh-SG can all be
> > "mapped" to Traditional. It would be nice if Jeffrey could confirm
> > this though.
> >
> > There's something else to consider though, I think: different
> > dialects, like Cantonese, Mandarin and Jin, are all grouped together
> > in zh, but their usage demographics don't coincide with country
> > borders. This makes me think that, for better or for worse, the limit
> > of granularity we can achieve with zh is country code (and, if
> > necessary, script). Which is unfortunate, because that means there are
> > variations in the specificity amongst language code–country code
> > combinations.
> >
> > Actually, technically each of what I've been referring to as
> > "dialects" are full-blown Sinetic languages, each with somewhere
> > between 5 and 500 dialects. So maybe this is too big to hope to
> > achieve accurate granularity with, and we should be content with just
> > using language & country codes. :-)
> >
> >> For reference, four letter script codes:
> >>
> >> http://www.unicode.org/iso15924/iso15924-codes.html
> >
> > Oooooh, you don't know how useful that page is to me. Thank you so
> > much. (I knew what Hans and Hant meant, but wasn't sure where those
> > tags came from or if they were standardized in any other place but the
> > IANA language tags list. Now I know.)
> >
> > Morgan
> >
> > _______________________________________________
> > wp-polyglots mailing list
> > wp-polyglots at lists.automattic.com
> > http://lists.automattic.com/mailman/listinfo/wp-polyglots
> 
> _______________________________________________
> wp-polyglots mailing list
> wp-polyglots at lists.automattic.com
> http://lists.automattic.com/mailman/listinfo/wp-polyglots
> 


-- 
Best regards,
Jeffrey Tam


More information about the wp-polyglots mailing list