[wp-polyglots] Seeking Maintainers

Tue Feb 22 00:58:55 GMT 2005

Update on the Chinese locales: I didn't notice before, but the IANA 
language tags list has the following:

zh-gan		Kan or Gan
zh-guoyu		Mandarin or Standard Chinese
zh-hakka		Hakka
zh-min		Min, Fuzhou, Hokkien, Amoy or Taiwanese
zh-min-nan	Minnan, Hokkien, Amoy, Tiawanese, Southern Min,
			Southern Fujian, Hoklo, Southern Fukien, Ho-lo
zh-wuu		Shanghaiese or Wu
zh-xiang		Xiang or Hunanese
zh-yue		Cantonese

So perhaps we should use the IANA tags. I'm not sure how we'd specify 
the script in addition to the "dialect," but at least it's more 
granular, and we could still probably map the dialects to scripts.

Morgan

On Feb 21, 2005, at 3:10 PM, Morgan Doocy wrote:

> On Feb 21, 2005, at 2:09 PM, Ryan Boren wrote:
>> Something becoming more common is to append the script code to the
>> language and country codes.  This is often done with Serbian to
>> distinguish between Cyrillic and Latin scripts.  sr_CS is Cyrillic and
>> sr_CS at Latn is Latin.
>
> Good to know. The IANA language tags page lists sr-Cyrl and sr-Latn 
> for that example. [1] It seems to me, however, that using sr_CS at Cyrl 
> and sr_CS at Latn has more granularity, since it specifies the country 
> (and therefore dialect?) as well.
>
> [1] http://www.iana.org/assignments/language-tags
>
>> I haven't seen this used with Chinese locales, however.  zh_CN, zh_TW,
>> and zh_HK are still widely used.  Perhaps dialect implies script in
>> these cases?  zh_CN is usually simplified Han (Hans) and zh_TW is
>> usually traditional Han (Hant), yes?  Is zh_CN at Hant, for example, a 
>> real
>> world situation or merely theoretical?
>
> I did a test a while back with Firefox and Google, and found that zh 
> and zh-CN came back Simplified; and zh-TW, zh-HK and zh-SG all came 
> back Traditional. I don't know how well this coincides with either 
> official designations or colloquial use, but it was a good test at the 
> time. My guess would be that CN is the only country actively using 
> Simplified right now, and that zh-TW, zh-HK and zh-SG can all be 
> "mapped" to Traditional. It would be nice if Jeffrey could confirm 
> this though.
>
> There's something else to consider though, I think: different 
> dialects, like Cantonese, Mandarin and Jin, are all grouped together 
> in zh, but their usage demographics don't coincide with country 
> borders. This makes me think that, for better or for worse, the limit 
> of granularity we can achieve with zh is country code (and, if 
> necessary, script). Which is unfortunate, because that means there are 
> variations in the specificity amongst language code–country code 
> combinations.
>
> Actually, technically each of what I've been referring to as 
> "dialects" are full-blown Sinetic languages, each with somewhere 
> between 5 and 500 dialects. So maybe this is too big to hope to 
> achieve accurate granularity with, and we should be content with just 
> using language & country codes. :-)
>
>> For reference, four letter script codes:
>>
>> http://www.unicode.org/iso15924/iso15924-codes.html
>
> Oooooh, you don't know how useful that page is to me. Thank you so 
> much. (I knew what Hans and Hant meant, but wasn't sure where those 
> tags came from or if they were standardized in any other place but the 
> IANA language tags list. Now I know.)
>
> Morgan
>
> _______________________________________________
> wp-polyglots mailing list
> wp-polyglots at lists.automattic.com
> http://lists.automattic.com/mailman/listinfo/wp-polyglots