[wp-polyglots] no-code-duplication i18n for WordPress

Fri Mar 7 09:52:51 GMT 2008

On Fri, Mar 07, 2008 at 10:18:35AM +0200, Nikolay Bachiyski wrote:
> 2008/3/7, Lionel Elie Mamane <lionel at mamane.lu>:
>> On Thu, Mar 06, 2008 at 06:44:28PM +0200, Nikolay Bachiyski wrote:

>>> Apart from not spending time on this issue, here is some
>>> proof-of-concept tokenizer code, which will make the extracting and
>>> replacing work very easy and clean:

>>> http://nb.niichavo.org/files/code/token_comments.phps

>> From reading this code, it seems to me it will miss any strings
>> containing variable references, while mine catches them (albeit in
>> the suboptimal way of cutting the string at the variable reference,
>> which is why I suggest to replace them by format strings in sprintf
>> calls);

> Recognising only literal strings was my goal. Letting translators
> translate arbitrary PHP code can and will be disastrous.

My code, as currently written, does not let them translate arbitrary
PHP code; it takes the pieces of the string that are literal and
translates them separately. E.g. when it finds
 "the dog named $foo is cute",
it translates "the dog named " and " is cute".

Assuming we decide we don't want to-be-translated strings to contain
variable references, only to be literal strings (which is what I would
recommend), there are several classes of reactions the translation
code can have when it encounters a to-be-translated string that
contains variable references:

 - fail noisily; e.g. return an error and a big fat error message of
   the form "I refuse to translate a non-literal string; change it to
   a format string in a sprintf()".

 - warning; don't translate it, but don't return an error and just
   print a warning like "WARNING - not translating non-literal string"

 - fail silently; not translate it and act as if nothing out of the
   ordinary happened.

 - fail gracefully; that is, make the best it can with the
   situation. Can (and now that I think about it, should) be combined
   with printing a warning.

Of these, there are good points for three of them, but I consider the
"fail silently" the worse and see no advantage for it. My current code
takes the "fail gracefully without warning" route (mainly because I
didn't think about that question up to now), I'd be OK with any of
that, fail gracefully with warning, warning or fail noisily, but I'm
significantly uncomfortable with fail silently. I'd recommend not to
take the fail silently route.

>>  compared to mine, it also does not allow to tag a whole function call
>>  (or code block) with several strings to be translated in it, it really
>>  needs a separate tag for each. OTOH, it doesn't use an end tag, this
>>  might balance it. Being less strict in the special-tag match is
>>  probably better, like allowing arbitrary amount of whitespace between
>>  "/*" and "WP_I18N"; should do that.As the tokenizer-method allows us
>>  to gettext-lookup the string itself (albeit in PHP-escaped form), I
>>  don't think recognising _all_ tags of the form WP_I18N_.* is
>>  desirable; just _one_ start tag and _one_ end tag. This allows to
>>  reserve the other ones for eventual future different use.

> Modifying the code, so that it doesn't impose so rigorous rules is
> very easy

Yes, it is mostly overwriting your code with mine, and writing mine
was very easy, so I fully agree with that assessment.

> -- this was just a proof-of-concept program. The main point of the
> code was to show that the tokenizer is a good friend of ours.

Oh, I thought the "good friend of ours" part was already well
established (I was completely convinced of it by writing my code),
that's why I started looking into the details of yours; I thought you
suggested different detailed workings. If that's not the case, then,
well, nothing to say.

-- 
Lionel