[wp-polyglots] Chinese Locales (was re: Seeking Maintainers)

Morgan Doocy morgan at doocy.net
Tue Feb 22 03:11:27 GMT 2005


--- Jeffrey's reply. ---

That's true, Morgan. An example here:

I am from Shanghai, China. We usually speak Mandarin (Putonghua) or
Shanghainese (Wu) in real life, however, the newspapar and all reading
material here are written by one language, Simplified Chinese with a
domestic standard derived from Mandarin (since the year 1949 when RPC
government was established). However, people don't know how to write
in Shanghainese even it is still a form of Chinese.

I hope you get my point from this example.

Regards,
Jeffrey Tam

On Mon, 21 Feb 2005 17:58:42 -0800, Morgan Doocy <morgan at doocy.net> 
wrote:
> Thanks Jeffrey, that helps a lot. Just to clarify though, are you
> saying that even though there are variations in spoken language, the
> written language remains the same? For example, if a Mandarin speaker
> and a Jin speaker were to sit down and write a story (let's assume they
> both use Simplified Han), they would use the same written vocabulary
> and grammar, despite the fact that they would speak differently if they
> were to tell the story aurally? If so, then we woudln't need to
> distinguish between dialects in writing, and ISO's standardization
> seems to reflect this.
>
> Morgan
>
> On Feb 21, 2005, at 5:36 PM, Jeffrey Tam wrote:
>
>> Hi Ryan and Morgan,
>>
>> Glad to hear from you. Here's my opinion on the Chinese locale issues:
>>
>> Different Chinese speaking countries (districts) have their own
>> systems of written languages, for example Simplifiled Chinese
>> (P.R.China, Singapore) and Traditional Chinese (Hong Kong, Taiwan,
>> Macau). Even for Simplified or Traditional, these standards have some
>> variants in expressing some phrases or proper nouns. Therefore,
>> Microsoft separated them in Windows XP (I am a Windows XP Tech
>> Support), and most open source software separate them into three. They
>> are zh_cn, zh_hk and zh-tw. And as I know, other countries that use
>> Simplified Chinese all tend to follow the standards used by the PRC
>> government, such as Singapore and Korea. For zh_hk and zh-tw, they may
>> have different expressions for computer hardware, so I suggest
>> separate them too.
>>
>> As for the IANA standards, I believe that they are more suitable for
>> speaking languages of Chinese. In China, we have at least more than 50
>> different dialects, but we all follow the Simplified Chinese system
>> made by the government when reading and writing. Some dialects don't
>> have their written forms. However, only one standards among the five I
>> mentioned before is used in one country (district).
>>
>> So, my suggestion is to define the translations based on the
>> administrative districts, either into 3 (PRC, HK, TW) or 5(PRC, SG,
>> HK, TW, Macau).
>>
>> Regards,
>> Jeffrey Tam
>>
>> On Mon, 21 Feb 2005 16:58:55 -0800, Morgan Doocy <morgan at doocy.net>
>> wrote:
>>> Update on the Chinese locales: I didn't notice before, but the IANA
>>> language tags list has the following:
>>>
>>> zh-gan          Kan or Gan
>>> zh-guoyu                Mandarin or Standard Chinese
>>> zh-hakka                Hakka
>>> zh-min          Min, Fuzhou, Hokkien, Amoy or Taiwanese
>>> zh-min-nan      Minnan, Hokkien, Amoy, Tiawanese, Southern Min,
>>>                        Southern Fujian, Hoklo, Southern Fukien, Ho-lo
>>> zh-wuu          Shanghaiese or Wu
>>> zh-xiang                Xiang or Hunanese
>>> zh-yue          Cantonese
>>>
>>> So perhaps we should use the IANA tags. I'm not sure how we'd specify
>>> the script in addition to the "dialect," but at least it's more
>>> granular, and we could still probably map the dialects to scripts.
>>>
>>> Morgan
>>>
>>> On Feb 21, 2005, at 3:10 PM, Morgan Doocy wrote:
>>>
>>>> On Feb 21, 2005, at 2:09 PM, Ryan Boren wrote:
>>>>> Something becoming more common is to append the script code to the
>>>>> language and country codes.  This is often done with Serbian to
>>>>> distinguish between Cyrillic and Latin scripts.  sr_CS is Cyrillic
>>>>> and
>>>>> sr_CS at Latn is Latin.
>>>>
>>>> Good to know. The IANA language tags page lists sr-Cyrl and sr-Latn
>>>> for that example. [1] It seems to me, however, that using sr_CS at Cyrl
>>>> and sr_CS at Latn has more granularity, since it specifies the country
>>>> (and therefore dialect?) as well.
>>>>
>>>> [1] http://www.iana.org/assignments/language-tags
>>>>
>>>>> I haven't seen this used with Chinese locales, however.  zh_CN,
>>>>> zh_TW,
>>>>> and zh_HK are still widely used.  Perhaps dialect implies script in
>>>>> these cases?  zh_CN is usually simplified Han (Hans) and zh_TW is
>>>>> usually traditional Han (Hant), yes?  Is zh_CN at Hant, for example, a
>>>>> real
>>>>> world situation or merely theoretical?
>>>>
>>>> I did a test a while back with Firefox and Google, and found that zh
>>>> and zh-CN came back Simplified; and zh-TW, zh-HK and zh-SG all came
>>>> back Traditional. I don't know how well this coincides with either
>>>> official designations or colloquial use, but it was a good test at
>>>> the
>>>> time. My guess would be that CN is the only country actively using
>>>> Simplified right now, and that zh-TW, zh-HK and zh-SG can all be
>>>> "mapped" to Traditional. It would be nice if Jeffrey could confirm
>>>> this though.
>>>>
>>>> There's something else to consider though, I think: different
>>>> dialects, like Cantonese, Mandarin and Jin, are all grouped together
>>>> in zh, but their usage demographics don't coincide with country
>>>> borders. This makes me think that, for better or for worse, the 
>>>> limit
>>>> of granularity we can achieve with zh is country code (and, if
>>>> necessary, script). Which is unfortunate, because that means there
>>>> are
>>>> variations in the specificity amongst language code–country code
>>>> combinations.
>>>>
>>>> Actually, technically each of what I've been referring to as
>>>> "dialects" are full-blown Sinetic languages, each with somewhere
>>>> between 5 and 500 dialects. So maybe this is too big to hope to
>>>> achieve accurate granularity with, and we should be content with 
>>>> just
>>>> using language & country codes. :-)
>>>>
>>>>> For reference, four letter script codes:
>>>>>
>>>>> http://www.unicode.org/iso15924/iso15924-codes.html
>>>>
>>>> Oooooh, you don't know how useful that page is to me. Thank you so
>>>> much. (I knew what Hans and Hant meant, but wasn't sure where those
>>>> tags came from or if they were standardized in any other place but
>>>> the
>>>> IANA language tags list. Now I know.)
>>>>
>>>> Morgan
>>>>
>>>> _______________________________________________
>>>> wp-polyglots mailing list
>>>> wp-polyglots at lists.automattic.com
>>>> http://lists.automattic.com/mailman/listinfo/wp-polyglots
>>>
>>> _______________________________________________
>>> wp-polyglots mailing list
>>> wp-polyglots at lists.automattic.com
>>> http://lists.automattic.com/mailman/listinfo/wp-polyglots
>>>
>>
>>
>> --
>> Best regards,
>> Jeffrey Tam
>
>


-- 
Best regards,
Jeffrey Tam



More information about the wp-polyglots mailing list