[wp-trac] [WordPress Trac] #63974: .mo file loaded as UTF-8 by default - non-standard and ignoring Content-Type headers

WordPress Trac noreply at wordpress.org
Mon Sep 15 14:38:44 UTC 2025


#63974: .mo file loaded as UTF-8 by default - non-standard and ignoring Content-
Type headers
--------------------------+------------------------------
 Reporter:  kkmuffme      |       Owner:  (none)
     Type:  defect (bug)  |      Status:  new
 Priority:  normal        |   Milestone:  Awaiting Review
Component:  I18N          |     Version:
 Severity:  normal        |  Resolution:
 Keywords:                |     Focuses:
--------------------------+------------------------------

Comment (by kkmuffme):

 Replying to [comment:5 siliconforks]:
 > Replying to [comment:2 kkmuffme]:
 > > If no Content-Encoding header is specified, it should be treated as
 ANSI. Since ANSI does not support multibyte characters, this means those
 should be removed.
 >
 > This seems like a bad idea to me, since silently deleting
 (non-)characters often leads to security vulnerabilities:
 >
 >
 https://www.unicode.org/reports/tr36/tr36-15.html#Deletion_of_Noncharacters

 Good point! What do you suggest?

 The current behavior allows for e.g.
 https://www.unicode.org/reports/tr36/tr36-15.html#Security_Levels_and_Alerts

 For .mo files - for obvious performance reasons - all sanitizing,...
 happens upon creation, so that the read/access process is as fast as
 possible (hash tables,...). This is why, when reading, one should adhere
 stricter to the standard.
 Which is also e.g. https://www.gnu.org/software/gettext/manual/html_node
 /msgfmt-Invocation.html
 >By default, messages are converted to UTF-8 encoding before being stored
 in a MO file; this helps avoiding conversions at run time, since nowadays
 most locales use the UTF-8 encoding.

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/63974#comment:7>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list