[wp-polyglots] Translation Guidelines / HTML Character Entities

Francesc Hervada-Sala francesc at hervada.org
Tue Sep 4 06:03:11 GMT 2007


Hi all,

as a newcomer I've spent some time reading the wp-polyglots archives and 
found many interesting discussions about encoding .po files as UTF-8 and 
the use of HTML character entities.

It seems to me that it is a common practice to use HTML character 
entitites for all special characters in translated messages. On the 
other hand the translation guidelines say that one should avoid using 
HTML character entities:

    With a few exceptions (noted below), all translations should be
    written literally, rather than escaping accented and special
    characters with HTML character entities.

Source: 
http://codex.wordpress.org/Translating_WordPress#Guidelines_and_requirements

I try to sum up:

   1. .mo files without HTML entities do not work for blogs using other
      character encodings than UTF-8 (the later being the default and
      recommended in WP).
   2. .mo files with HTML entities do not work for e-mail messages sent
      by wordpress.
   3. .po files with HTML entites are less translator-friendly and thus
      more error-prone.

As Kim Suominen pointed out on March 7th, 2005, the best solution would 
be the WP core to translate UTF-8 into the blog's character encoding on 
runtime (both when generating html and e-mails). See 
http://comox.textdrive.com/pipermail/wp-polyglots/2005-March/000449.html

At the translation files I've worked on (catalan for WP 2.2, 2.2.1 and 
2.2.2) I've followed this approach:

    * translated strings in .po files contain no HTML character entities
      (original strings are obviously left with entities untouched)
    * a Perl script I wrote generates an equivalent .po file with HTML
      character entities in translated strings
    * there are 2 deployed versions of the WP catalan translation: the
      "normal version" (just for UTF-8 blogs, works fine with e-mail),
      the "html version" (works with all blog character encodings,
      produces "ugly" error messages)

Do you think this approach could be generalised for all WP localizations?

By the way I think the common practice today does not meet the 
guidelines - we should change one of both to let them accord.

Cheers,

Francesc Hervada-Sala





-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://comox.textdrive.com/pipermail/wp-polyglots/attachments/20070904/4c357b22/attachment.htm


More information about the wp-polyglots mailing list