[wp-polyglots] Locale Directory Structure

Ryan Boren ryan at boren.nu
Mon Feb 28 06:25:37 GMT 2005


On Mon, 2005-02-28 at 00:07 -0500, K Suominen wrote:
>On Sun, 27 Feb 2005 22:34:06 -0600, Ryan Boren <ryan at boren.nu> wrote:
>> In order to make automated packaging easier, let's define the locale
>> directory structure more precisely.
>
>The fi_FI locale currently already has fully automated packaging
>implemented.  It can also be easily extended to cover more locales, by
>calling make(1) recursively, e.g. from an upper-level directory.

That's good, although I hesitate to ask people to maintain Makefiles.
One simple script could handle the packaging chores for all locales.
Makefiles allow each locale to be flexible in how directories and files
are organized, but that flexiblity comes at a bit of a maintenance
cost.  

>The only definitive version of the GPL is the one in English -- it is
>probably not a good idea to replace the English version in any
>distribution.

Locales that provide a translation should follow these guidelines:

http://www.gnu.org/licenses/translations.html

>I think at least a couple other translations in addition to Finnish
>are providing a separate file with a translated GPL for assistance in
>interpreting the definitive version.  That seems to be what other
>projects have done as well.  (Maybe it is a GNU guideline, but I
>haven't checked.)
>

Sounds like a good policy.

>When I was automating the packaging, I thought it would have been much
>easier to place all files in a single tree, reflecting their proper
>location in the final distribution.  In other words:
>
>readme.html
>wp-config-sample.php
>wp-content/themes/default/...
>wp-includes/languages/fr_FR.po  (not that the .po file needs to be distributed
>wp-includes/languages/fr_FR.mo
>
>What's the benefit from placing the themes or the message file elsewhere?

There's no need to replicate ugly directory structure.
Makefiles/scripts can put things where they need to be in the final
packaging.  Let's use automation to hide the annoying details.

>It is not just themes that need this -- all the other translated files
>need it, too.

True.  I guess dist would need it too.

>For Finnish, though, I found it easiest to automate the conversion,
>eliminating the need to commit different converted versions of the
>same content into the repository.

I prefer automated conversion too.  Use UTF-8 for source files; use
iconv to generate other encodings.  Ideally, we would have one UTF-8 po
for each locale with all other po and mo files being generated.
Generated files need not be committed, although committing them is
convenient for those who just want to pick up a ready-to-go mo from the
repository without getting a full package.  Committing derived objects
allows us to provide versioned downloads of the end-user shippables
without having to mess with a separate staging area.  That convenience
may not justify the repository clutter, however.

>I also consider the files in the repository to be "source" files.  The
>automated packaging procedure (in the Makefile's and the top-level
>build.sh) converts the source files to the correct character set or
>entity encoding as required.  This minimizes the maintenance.

Yes, agreed.

>However, it also has a second impact: the "source" files should not be
>distributed "as is."  They should be processed as part of the
>packaging, as implemented.  For example, the "source" files are in ISO
>8859-1, because UTF-8 support still falls short on the UNIX systems I
>use.  I have set the svn:mime-type property accordingly.

Older UNIX systems do indeed suck.  Newer Linux ones are usually UTF-8
out-of-the-box.  Regardless, UTF-8 makes the best source because it can
be converted to just about every other character set.  It is a universal
donor.

Summary (Ryan's ideal world that he will never have):

I prefer having only one UTF-8 po file, one UTF-8 theme, and one UTF-8
set of dist files with all other encodings being generated by iconv.
Translators would work only with the UTF-8 "source" files since
everything else is generated.  If we want to commit generated files,
they should go in a separate directory structure.  We could have a
source hierarchy and a derived encoding hierarchy.

Source hierarchy (everything in UTF-8):

trunk/messages/
trunk/theme/
trunk/dist/

Generated encoding hierarchy:

/trunk/ISO-8859-1/messages/
/trunk/ISO-8859-1/theme/
/trunk/ISO-8859-1/dist/

Translators would work only with the source files.  The files under the
ISO-8859-1 directory would be generated from the source files.  They are
committed to the repository merely as a convenience to users wanting to
selectively retrieve specific files in their preferred encoding.

Locale Makefiles are nice, but I think getting everyone to buy into them
will be difficult.  Makefiles are not pretty things.

Automation proposal:  Allow locale Makefiles for those who want to
maintain them.  For those who don't, we'll provide a universal script
that will work with any properly structured locale.  When packaging
locales, the Makefile will be used if present.  Otherwise, the universal
script is used.

Ryan



More information about the wp-polyglots mailing list