[wp-polyglots] Unicode characters instead of entities in POT

Thomas Scholz info at toscho.de
Tue Apr 28 13:42:04 GMT 2009


Nikolay Bachiyski:

> On Tue, Apr 28, 2009 at 16:05, Xavier Borderie <xavier at borderie.net>  
> wrote:
>> Just saw this ticket being closed by Nikolay:
>> http://core.trac.wordpress.org/ticket/7099
>>
>> Since the POT (and PO/MO) uses UTF-8, why can't we just use actual
>> Unicode characters rather than their HTML entities equivalent?
>
> There are many editors, which don't support either showing or entering
> these characters. Browsers are a lot smarter. They revert to a basic
> font if the current one can't show the character and don't rely only
> on UTF-8 representation.

This depends on the MIME type: In XHTML (application/xhtml+xml) only five  
entities MUST be resolved: &lt;, &gt;, &quot;, &amp; and &apos; (&apos;  
should be avoided due to bad support in some HTML user agents). Any other  
entity might be shown literal. Some older user agents (Opera 7, early  
Gecko derivates) do exactly that.

So: If you don’t use real UTF-8, use numeric character references, eg.  
&#8230; not &hellip;.

> A translator needs a working knowledge of HTML anyway. Replacing or
> adding verbose descriptions of entities isn't worth it.

Antithesis: A translator needs basic knowledge of character encoding  
anyway. Finding and using UTF-8 capable software shouldn’t be so hard.  
This is even mandatory for any language with an ISO-8859-1 incompatible  
alphabet (russian, tamil etc.).

Thomas
-- 
Redaktion, Druck- und Webdesign
http://toscho.de
0160/1764727
Twitter: @toscho


More information about the wp-polyglots mailing list