[wp-polyglots] Unicode characters instead of entities in POT

Samuel Murray (Groenkloof) samuel at translate.org.za
Tue Apr 28 14:10:25 GMT 2009


Thomas Scholz wrote:

> This depends on the MIME type...

I think it depends on the browser :-)

> Any other entity might be shown literal.

Is there a list somewhere on the web of which browsers show them literal?

> Some older user agents (Opera 7...) do exactly that.

Earlier versions of Opera implemented the standards very strictly, and 
as a result, many web sites did not work correctly in Opera.  Then the 
Opera people got smart and started implementing what they call "street 
HTML", if Opera detects that a page is possibly non-compliant.

> So: If you don’t use real UTF-8, use numeric character references, eg. 
> … not ….

I think a reason why hellip may be used is because it is easy to "read" 
what the character is.  It is an ellips.  If numbered codes were used, 
translators would not know what the code means unless they used a 
look-up table, and volunteer translators tend not to use look-up tables 
-- they prefer educated guesswork, which in the case of numbered 
entities can be dangerous.

My own opinion is to reduce the "fancy" characters to a minimum.

>> A translator needs a working knowledge of HTML anyway. Replacing or
>> adding verbose descriptions of entities isn't worth it.

> Antithesis: A translator needs basic knowledge of character encoding 
> anyway.

This applies to trained, professional translators.  Volunteer 
translators are often amateurs and have very little training.  One has 
to be pragmatic -- a logical, easy to use system is better than 
something which is correct only from a purist's point of view.

> Finding and using UTF-8 capable software shouldn’t be so hard. 

Do you know of PO editors that make it easy for translators to type the 
raquo and the hellip?  Neither PoEdit nor Virtaal does.

Samuel

-- 
Samuel Murray
samuel at translate.org.za




More information about the wp-polyglots mailing list