[glotpress-updates] [GlotPress] #342: Cleanup to locales.php

Tue Jul 8 22:37:48 UTC 2014

#342: Cleanup to locales.php
---------------------------------+-----------------
  Reporter:  stuwest             |      Owner:
      Type:  enhancement         |     Status:  new
  Priority:  normal              |  Milestone:
 Component:  locale information  |    Version:
Resolution:                      |   Keywords:
---------------------------------+-----------------

Comment (by stuwest):

 Thanks for feedback everyone. Replies:

 >  * We need all of the ISO codes as they are the best reference we have
 for matching an Accept-Language browser header to a corresponding WP
 translation. (I've been working on this problem as recently as this
 morning.)

 Interesting. OK. Do we really need all the duplicate codes for that or
 would the two-digit and the three-digit be enough? Also, it bugs me there
 are some errors in those (I just noticed one mi but didn't look at all of
 them) so if we're going to leave it perhaps should clean those up.

 >As to the name of a language, both in English and native, that really
 should be a matter for each translation team to decide. To give you an
 idea of the kinds of issues we're facing (just examples, not exhaustive);
 the pt_PT community, caught in the middle of a very polemic and artificial
 spelling reform, refuses to adhere to it (like most of the country) and
 still capitalizes the language name (i.e. "Português" and not
 "português"). zn-ch is a whole other discussion, as "Chinese" is really a
 macro-language and not a specific variant.
 >  * As Zé has also pointed out, a lot of the native and English names
 have history, as in were requested/determined by translators. I'm fine
 with trying to move to more accepted representations, particularly in
 cases where there was no deliberate decision to deviate from that.

 Yeah it's tough to know whether the mishmash of inconsistent naming was a)
 carefully thought out following in-depth review of each name, or b) the
 result of on-again, off-again focus by volunteers. :)

 On Chinese zh, what caught my eye is that zh-cn is fully translated while
 zh isn't even in GlotPress (if
 http://translate.wordpress.org/projects/wp/dev/zh/default is where I
 should look). So yes it's a macro language but not one that we
 consistently use so it seems a distraction to include in locales.php.

 >  * Fallback is a loaded term. CLDR's approach is highly complicated and
 it looks like it is simplified significantly here. See
 http://www.unicode.org/reports/tr35/#Locale_Inheritance,
 http://www.unicode.org/reports/tr35/#LanguageMatching, etc. We also have a
 need to introduce variants, such as sr_Latn and concepts like an
 "informal" German translation. See also
 http://www.unicode.org/reports/tr35/#Likely_Subtags.

 Yeah CLDR's approach IMHO makes sense when there's a detailed clean
 structure for locale codes so you can follow their hierarchical model. But
 we've got a ton of locales that aren't even in CLDR so a simple one-
 dimensional fallback seemed a better fit. (I've played with Mediawiki's
 similar implementation see
 https://www.mediawiki.org/wiki/Manual:Language#mediaviewer/File:MediaWiki_fallback_chains.svg
 for a pretty chart).

 > This patch also has syntax errors. $nl-be is not a valid name for a
 variable.
 >  * Aside from the variable syntax issues johnbillion points out, the
 local variables ("object names") *are* actually used outside the class.
 See
 https://glotpress.trac.wordpress.org/browser/trunk/locales/locales.php?rev=931#L1260.
 I didn't code this, talk to Nikolay. :-)

 Ah ok.  Should have caught the dash in object name was too excited about
 consistency with the slug. On being used outside the class, *cough* Yoav
 *cough*.

 > In general, smaller changes will definitely be easier to review. As in,
 tackling all ISO changes in one go, all fallbacks in another pass, all
 reordering at once, etc.
 >
 > Can I ask if there was some kind of impetus for this?

 On impetus, it started off a month ago wanting to make a small change to
 allow CLDR country names in some stats reports.  A month later, I have a
 monster patch that fixes about a dozen different inconsistencies that
 caught my eye. You know how it goes. :)

 Seriously though it feels like CLDR could be helpful esp. with 4.0 and
 mostly it's been a great chance for me to explore the code and try to get
 caught up on i18n stuff since 4-5 years ago when I paid a lot of attention
 to it for mediawiki. I want to help.

 > Which furthermore illustrates another issue: people in Belgium would
 probably rather refer to nl_BE (dutch-as-spoken-in-Belgium) as vls
 (Vlaams). Welcome to the languages can of worms :D

 Interesting. One of the theoretical benefits of CLDR is that it's
 relatively standardized on typical usage. So there still might be
 difference of opinion, but for the most part that's up to Unicode to sort
 out. Do you buy that argument?

 > I'm not sure why $locale->rtl was changed from true to '1'. This should
 remain as a boolean.

 That's a bug. On my list of things to fix in the script that generates
 this but told myself I'd do manually and then forgot.

 > Some country codes got changed to be uppercase, but a lot stayed the
 same. Probably best to keep it as is for compatibility reasons.

 I've thought it was a bit of a standard that a) language codes were
 lowercase and country codes were upper case and b) just in case you should
 always use case insensitive comparisons. Should I not think that?

--
Ticket URL: <https://glotpress.trac.wordpress.org/ticket/342#comment:8>
GlotPress <https://glotpress.trac.wordpress.org>
Easy comin', easy goin'