[wp-hackers] no-code-duplication i18n for WordPress

Lionel Elie Mamane lionel at mamane.lu
Mon Mar 3 08:00:36 GMT 2008


Hello,

I'm Lionel Elie Mamane, I recently got involved in the Debian
packaging of WordPress, and I'd like to propose you an enhancement in
your full-i18n procedures, so that all languages are guaranteed to
use the same code, only strings differ. I come bearing "proof of
concept" patches, available from
http://people.debian.org/~lmamane/wordpress/ .

HISTORY
=======

Originally, all I wanted was to have French WordPress packaged in
Debian, but then I saw there were code differences (in version 2.3.2)
that might have been security issues (or not) between the English and
French versions, so I decided to add all the French support I could to
Debian's WordPress package (that is the ll_CC.js files and the .mo
files, and a few code differences that would not break the English).

Then, Nikolay Bachiyski and I started discussing why things were as
they were in WordPress and started brainstorming for a way to improve
them that would not get in the way of English version development too
much. That discussion is at http://bugs.debian.org/461617, but its
result is summarised below, so you don't need to go read it all unless
you want to.

I18N PROBLEMS IN CURRENT WORDPRESS
==================================

As you probably all know, the main problem is that some strings
(mainly error messages) are output before gettext is loaded / ready to
be used, so these strings cannot be translated by gettext and are
nowadays hard-coded in the source. Translating WordPress into a new
language entails forking the code and translating these hard-coded
strings. A work continuously to be redone for new releases.

THE PROPOSAL
============

I absolutely want all language versions of WordPress to share the same
code. That is so that it is less work Debian-wise to ship them all,
and also so that security updates need to update only one copy of the
code. Only one copy -> less work, none is forgotten.

To achieve that, the idea is that fully-localised versions are
generated automatically from the English version by replacing the
hard-coded strings in a build stage. You can still ship
already-translated sources tarballs for your users. This allows Debian
to easily ship all (no 15 tarballs to download and package, only one
and then a build stage), this allows you to easily get out security
updates for all languages at once (correct in English source, run
build stage for every language, tar the 15 obtained trees, upload,
announce), ... To make implementation easier, the strings to be
statically translated are tagged with special comments.

I implemented a script to do this build stage; the patch (against
trunk as of yesterday) is at
http://people.debian.org/~lmamane/wordpress/no-code-dup-i18n-poc.patch

It adds a "i18n-tools" directory to the wordpress source tree, but we
can also put it elsewhere (the code is location-independent), no
problem. It creates a README file in that explains how it works.

There is a (partial) example .po file for French for this static
translation at http://people.debian.org/~lmamane/wordpress/fr.po .

The patch also tags two strings in the code (those that are in the
fr.po), for tests and such. For now, they are only tagged as-is,
meaning that they are cut at the places where there are variable
references in the "-delimited string. That is bad, they should be
converted to sprintf (as the README says, like for gettext-translated
strings); I'll do it soon, I wanted to get this email out rather
sooner than later.

Let me know what you think, whether I should finish that thing up and
tag all remaining strings, etc.


REMAINING ISSUES
================

The list of languages supported by the spellchecker in TinyMCE is
hardcoded; it should be dynamically constructed to be what is
installed on this particular server.


Best Regards,

-- 
Lionel


More information about the wp-hackers mailing list