[wp-polyglots] Chinese Locales (was re: Seeking Maintainers)

Morgan Doocy morgan at doocy.net
Tue Feb 22 02:15:24 GMT 2005


--- Oops, meant to send this to the list. ---

Thanks Jeffrey, that helps a lot. Just to clarify though, are you 
saying that even though there are variations in spoken language, the 
written language remains the same? For example, if a Mandarin speaker 
and a Jin speaker were to sit down and write a story (let's assume they 
both use Simplified Han), they would use the same written vocabulary 
and grammar, despite the fact that they would speak differently if they 
were to tell the story aurally? If so, then we woudln't need to 
distinguish between dialects in writing, and ISO's standardization 
seems to reflect this.

Morgan

On Feb 21, 2005, at 5:36 PM, Jeffrey Tam wrote:

> Hi Ryan and Morgan,
>
> Glad to hear from you. Here's my opinion on the Chinese locale issues:
>
> Different Chinese speaking countries (districts) have their own
> systems of written languages, for example Simplifiled Chinese
> (P.R.China, Singapore) and Traditional Chinese (Hong Kong, Taiwan,
> Macau). Even for Simplified or Traditional, these standards have some
> variants in expressing some phrases or proper nouns. Therefore,
> Microsoft separated them in Windows XP (I am a Windows XP Tech
> Support), and most open source software separate them into three. They
> are zh_cn, zh_hk and zh-tw. And as I know, other countries that use
> Simplified Chinese all tend to follow the standards used by the PRC
> government, such as Singapore and Korea. For zh_hk and zh-tw, they may
> have different expressions for computer hardware, so I suggest
> separate them too.
>
> As for the IANA standards, I believe that they are more suitable for
> speaking languages of Chinese. In China, we have at least more than 50
> different dialects, but we all follow the Simplified Chinese system
> made by the government when reading and writing. Some dialects don't
> have their written forms. However, only one standards among the five I
> mentioned before is used in one country (district).
>
> So, my suggestion is to define the translations based on the
> administrative districts, either into 3 (PRC, HK, TW) or 5(PRC, SG,
> HK, TW, Macau).
>
> Regards,
> Jeffrey Tam
>
> On Mon, 21 Feb 2005 16:58:55 -0800, Morgan Doocy <morgan at doocy.net> 
> wrote:
>> Update on the Chinese locales: I didn't notice before, but the IANA
>> language tags list has the following:
>>
>> zh-gan          Kan or Gan
>> zh-guoyu                Mandarin or Standard Chinese
>> zh-hakka                Hakka
>> zh-min          Min, Fuzhou, Hokkien, Amoy or Taiwanese
>> zh-min-nan      Minnan, Hokkien, Amoy, Tiawanese, Southern Min,
>>                        Southern Fujian, Hoklo, Southern Fukien, Ho-lo
>> zh-wuu          Shanghaiese or Wu
>> zh-xiang                Xiang or Hunanese
>> zh-yue          Cantonese
>>
>> So perhaps we should use the IANA tags. I'm not sure how we'd specify
>> the script in addition to the "dialect," but at least it's more
>> granular, and we could still probably map the dialects to scripts.
>>
>> Morgan
>>
>> On Feb 21, 2005, at 3:10 PM, Morgan Doocy wrote:
>>
>>> On Feb 21, 2005, at 2:09 PM, Ryan Boren wrote:
>>>> Something becoming more common is to append the script code to the
>>>> language and country codes.  This is often done with Serbian to
>>>> distinguish between Cyrillic and Latin scripts.  sr_CS is Cyrillic 
>>>> and
>>>> sr_CS at Latn is Latin.
>>>
>>> Good to know. The IANA language tags page lists sr-Cyrl and sr-Latn
>>> for that example. [1] It seems to me, however, that using sr_CS at Cyrl
>>> and sr_CS at Latn has more granularity, since it specifies the country
>>> (and therefore dialect?) as well.
>>>
>>> [1] http://www.iana.org/assignments/language-tags
>>>
>>>> I haven't seen this used with Chinese locales, however.  zh_CN, 
>>>> zh_TW,
>>>> and zh_HK are still widely used.  Perhaps dialect implies script in
>>>> these cases?  zh_CN is usually simplified Han (Hans) and zh_TW is
>>>> usually traditional Han (Hant), yes?  Is zh_CN at Hant, for example, a
>>>> real
>>>> world situation or merely theoretical?
>>>
>>> I did a test a while back with Firefox and Google, and found that zh
>>> and zh-CN came back Simplified; and zh-TW, zh-HK and zh-SG all came
>>> back Traditional. I don't know how well this coincides with either
>>> official designations or colloquial use, but it was a good test at 
>>> the
>>> time. My guess would be that CN is the only country actively using
>>> Simplified right now, and that zh-TW, zh-HK and zh-SG can all be
>>> "mapped" to Traditional. It would be nice if Jeffrey could confirm
>>> this though.
>>>
>>> There's something else to consider though, I think: different
>>> dialects, like Cantonese, Mandarin and Jin, are all grouped together
>>> in zh, but their usage demographics don't coincide with country
>>> borders. This makes me think that, for better or for worse, the limit
>>> of granularity we can achieve with zh is country code (and, if
>>> necessary, script). Which is unfortunate, because that means there 
>>> are
>>> variations in the specificity amongst language code–country code
>>> combinations.
>>>
>>> Actually, technically each of what I've been referring to as
>>> "dialects" are full-blown Sinetic languages, each with somewhere
>>> between 5 and 500 dialects. So maybe this is too big to hope to
>>> achieve accurate granularity with, and we should be content with just
>>> using language & country codes. :-)
>>>
>>>> For reference, four letter script codes:
>>>>
>>>> http://www.unicode.org/iso15924/iso15924-codes.html
>>>
>>> Oooooh, you don't know how useful that page is to me. Thank you so
>>> much. (I knew what Hans and Hant meant, but wasn't sure where those
>>> tags came from or if they were standardized in any other place but 
>>> the
>>> IANA language tags list. Now I know.)
>>>
>>> Morgan
>>>
>>> _______________________________________________
>>> wp-polyglots mailing list
>>> wp-polyglots at lists.automattic.com
>>> http://lists.automattic.com/mailman/listinfo/wp-polyglots
>>
>> _______________________________________________
>> wp-polyglots mailing list
>> wp-polyglots at lists.automattic.com
>> http://lists.automattic.com/mailman/listinfo/wp-polyglots
>>
>
>
> -- 
> Best regards,
> Jeffrey Tam



More information about the wp-polyglots mailing list