[wp-polyglots] Translation Guidelines / HTML Character Entities

Nikolay Bachiyski nbachiyski at developer.bg
Tue Sep 4 07:16:59 GMT 2007


2007/9/4, Francesc Hervada-Sala <francesc at hervada.org>:
>
>  Hi all,

Hello Francesc,

>  It seems to me that it is a common practice to use HTML character entitites
> for all special characters in translated messages. On the other hand the
> translation guidelines say that one should avoid using HTML character
> entities:
>
> With a few exceptions (noted below), all translations should be written
> literally, rather than escaping accented and special characters with HTML
> character entities.
>  Source:
> http://codex.wordpress.org/Translating_WordPress#Guidelines_and_requirements
>
>  I try to sum up:
>
>
> .mo files without HTML entities do not work for blogs using other character
> encodings than UTF-8 (the later being the default and recommended in WP).
> .mo files with HTML entities do not work for e-mail messages sent by
> wordpress.
> .po files with HTML entites are less translator-friendly and thus more
> error-prone.
>  As Kim Suominen pointed out on March 7th, 2005, the best solution would be
> the WP core to translate UTF-8 into the blog's character encoding on runtime
> (both when generating html and e-mails). See
> http://comox.textdrive.com/pipermail/wp-polyglots/2005-March/000449.html
>
>  At the translation files I've worked on (catalan for WP 2.2, 2.2.1 and
> 2.2.2) I've followed this approach:
>
>
> translated strings in .po files contain no HTML character entities (original
> strings are obviously left with entities untouched)
> a Perl script I wrote generates an equivalent .po file with HTML character
> entities in translated strings
> there are 2 deployed versions of the WP catalan translation: the "normal
> version" (just for UTF-8 blogs, works fine with e-mail), the "html version"
> (works with all blog character encodings, produces "ugly" error messages)
>  Do you think this approach could be generalised for all WP localizations?
>
>  By the way I think the common practice today does not meet the guidelines -
> we should change one of both to let them accord.

Does anybody these days really need encoding different than utf-8? If
one really does, converting the po file and recompiling the mo seems
the best solution.

Converting the utf-8 characters to entities is a big overhead --
either for the translator or the man who prepares the pot file. Also
it comes with about 3 times strings size.

Anyway, if you really need to use other encodings, you can do it on
the fly with Kimmo's plugin: http://kimmo.suominen.com/sw/charsets/

Happy translating,
Nikolay.


More information about the wp-polyglots mailing list