[wp-polyglots] Chinese Locales (was re: Seeking Maintainers)

Morgan Doocy morgan at doocy.net
Tue Feb 22 03:24:46 GMT 2005


Great. Thanks for the very helpful information, Jeffrey.

I suppose, then, we would only need two versions of the Chinese 
localizations, for Simplified and Traditional. Otherwise we'd be 
maintaining multiple identical copies of the same translation for 
different countries (i.e. zh_CN and zh_SG would be identical). The 
obvious choice would be to use zh_Hans and zh_Hant instead.

I'm not sure if naming them like this would break WP's locale 
functions, but from what I've seen it doesn't look like it would. Ryan, 
thoughts?

Morgan

On Feb 21, 2005, at 7:11 PM, Morgan Doocy wrote:

> --- Jeffrey's reply. ---
>
> That's true, Morgan. An example here:
>
> I am from Shanghai, China. We usually speak Mandarin (Putonghua) or
> Shanghainese (Wu) in real life, however, the newspapar and all reading
> material here are written by one language, Simplified Chinese with a
> domestic standard derived from Mandarin (since the year 1949 when RPC
> government was established). However, people don't know how to write
> in Shanghainese even it is still a form of Chinese.
>
> I hope you get my point from this example.
>
> Regards,
> Jeffrey Tam
>
> On Mon, 21 Feb 2005 17:58:42 -0800, Morgan Doocy <morgan at doocy.net> 
> wrote:
>> Thanks Jeffrey, that helps a lot. Just to clarify though, are you
>> saying that even though there are variations in spoken language, the
>> written language remains the same? For example, if a Mandarin speaker
>> and a Jin speaker were to sit down and write a story (let's assume 
>> they
>> both use Simplified Han), they would use the same written vocabulary
>> and grammar, despite the fact that they would speak differently if 
>> they
>> were to tell the story aurally? If so, then we woudln't need to
>> distinguish between dialects in writing, and ISO's standardization
>> seems to reflect this.
>>
>> Morgan
>>
>> On Feb 21, 2005, at 5:36 PM, Jeffrey Tam wrote:
>>
>>> Hi Ryan and Morgan,
>>>
>>> Glad to hear from you. Here's my opinion on the Chinese locale 
>>> issues:
>>>
>>> Different Chinese speaking countries (districts) have their own
>>> systems of written languages, for example Simplifiled Chinese
>>> (P.R.China, Singapore) and Traditional Chinese (Hong Kong, Taiwan,
>>> Macau). Even for Simplified or Traditional, these standards have some
>>> variants in expressing some phrases or proper nouns. Therefore,
>>> Microsoft separated them in Windows XP (I am a Windows XP Tech
>>> Support), and most open source software separate them into three. 
>>> They
>>> are zh_cn, zh_hk and zh-tw. And as I know, other countries that use
>>> Simplified Chinese all tend to follow the standards used by the PRC
>>> government, such as Singapore and Korea. For zh_hk and zh-tw, they 
>>> may
>>> have different expressions for computer hardware, so I suggest
>>> separate them too.
>>>
>>> As for the IANA standards, I believe that they are more suitable for
>>> speaking languages of Chinese. In China, we have at least more than 
>>> 50
>>> different dialects, but we all follow the Simplified Chinese system
>>> made by the government when reading and writing. Some dialects don't
>>> have their written forms. However, only one standards among the five 
>>> I
>>> mentioned before is used in one country (district).
>>>
>>> So, my suggestion is to define the translations based on the
>>> administrative districts, either into 3 (PRC, HK, TW) or 5(PRC, SG,
>>> HK, TW, Macau).
>>>
>>> Regards,
>>> Jeffrey Tam
>>>
>>> On Mon, 21 Feb 2005 16:58:55 -0800, Morgan Doocy <morgan at doocy.net>
>>> wrote:
>>>> Update on the Chinese locales: I didn't notice before, but the IANA
>>>> language tags list has the following:
>>>>
>>>> zh-gan          Kan or Gan
>>>> zh-guoyu                Mandarin or Standard Chinese
>>>> zh-hakka                Hakka
>>>> zh-min          Min, Fuzhou, Hokkien, Amoy or Taiwanese
>>>> zh-min-nan      Minnan, Hokkien, Amoy, Tiawanese, Southern Min,
>>>>                        Southern Fujian, Hoklo, Southern Fukien, 
>>>> Ho-lo
>>>> zh-wuu          Shanghaiese or Wu
>>>> zh-xiang                Xiang or Hunanese
>>>> zh-yue          Cantonese
>>>>
>>>> So perhaps we should use the IANA tags. I'm not sure how we'd 
>>>> specify
>>>> the script in addition to the "dialect," but at least it's more
>>>> granular, and we could still probably map the dialects to scripts.
>>>>
>>>> Morgan
>>>>
>>>> On Feb 21, 2005, at 3:10 PM, Morgan Doocy wrote:
>>>>
>>>>> On Feb 21, 2005, at 2:09 PM, Ryan Boren wrote:
>>>>>> Something becoming more common is to append the script code to the
>>>>>> language and country codes.  This is often done with Serbian to
>>>>>> distinguish between Cyrillic and Latin scripts.  sr_CS is Cyrillic
>>>>>> and
>>>>>> sr_CS at Latn is Latin.
>>>>>
>>>>> Good to know. The IANA language tags page lists sr-Cyrl and sr-Latn
>>>>> for that example. [1] It seems to me, however, that using 
>>>>> sr_CS at Cyrl
>>>>> and sr_CS at Latn has more granularity, since it specifies the country
>>>>> (and therefore dialect?) as well.
>>>>>
>>>>> [1] http://www.iana.org/assignments/language-tags
>>>>>
>>>>>> I haven't seen this used with Chinese locales, however.  zh_CN,
>>>>>> zh_TW,
>>>>>> and zh_HK are still widely used.  Perhaps dialect implies script 
>>>>>> in
>>>>>> these cases?  zh_CN is usually simplified Han (Hans) and zh_TW is
>>>>>> usually traditional Han (Hant), yes?  Is zh_CN at Hant, for example, 
>>>>>> a
>>>>>> real
>>>>>> world situation or merely theoretical?
>>>>>
>>>>> I did a test a while back with Firefox and Google, and found that 
>>>>> zh
>>>>> and zh-CN came back Simplified; and zh-TW, zh-HK and zh-SG all came
>>>>> back Traditional. I don't know how well this coincides with either
>>>>> official designations or colloquial use, but it was a good test at
>>>>> the
>>>>> time. My guess would be that CN is the only country actively using
>>>>> Simplified right now, and that zh-TW, zh-HK and zh-SG can all be
>>>>> "mapped" to Traditional. It would be nice if Jeffrey could confirm
>>>>> this though.
>>>>>
>>>>> There's something else to consider though, I think: different
>>>>> dialects, like Cantonese, Mandarin and Jin, are all grouped 
>>>>> together
>>>>> in zh, but their usage demographics don't coincide with country
>>>>> borders. This makes me think that, for better or for worse, the 
>>>>> limit
>>>>> of granularity we can achieve with zh is country code (and, if
>>>>> necessary, script). Which is unfortunate, because that means there
>>>>> are
>>>>> variations in the specificity amongst language code–country code
>>>>> combinations.
>>>>>
>>>>> Actually, technically each of what I've been referring to as
>>>>> "dialects" are full-blown Sinetic languages, each with somewhere
>>>>> between 5 and 500 dialects. So maybe this is too big to hope to
>>>>> achieve accurate granularity with, and we should be content with 
>>>>> just
>>>>> using language & country codes. :-)
>>>>>
>>>>>> For reference, four letter script codes:
>>>>>>
>>>>>> http://www.unicode.org/iso15924/iso15924-codes.html
>>>>>
>>>>> Oooooh, you don't know how useful that page is to me. Thank you so
>>>>> much. (I knew what Hans and Hant meant, but wasn't sure where those
>>>>> tags came from or if they were standardized in any other place but
>>>>> the
>>>>> IANA language tags list. Now I know.)
>>>>>
>>>>> Morgan
>>>>>
>>>>> _______________________________________________
>>>>> wp-polyglots mailing list
>>>>> wp-polyglots at lists.automattic.com
>>>>> http://lists.automattic.com/mailman/listinfo/wp-polyglots
>>>>
>>>> _______________________________________________
>>>> wp-polyglots mailing list
>>>> wp-polyglots at lists.automattic.com
>>>> http://lists.automattic.com/mailman/listinfo/wp-polyglots
>>>>
>>>
>>>
>>> --
>>> Best regards,
>>> Jeffrey Tam
>>
>>
>
>
> -- 
> Best regards,
> Jeffrey Tam
>
> _______________________________________________
> wp-polyglots mailing list
> wp-polyglots at lists.automattic.com
> http://lists.automattic.com/mailman/listinfo/wp-polyglots



More information about the wp-polyglots mailing list