[wp-trac] [WordPress Trac] #63974: .mo file loaded as UTF-8 by default - non-standard and ignoring Content-Type headers
WordPress Trac
noreply at wordpress.org
Mon Sep 15 14:38:44 UTC 2025
#63974: .mo file loaded as UTF-8 by default - non-standard and ignoring Content-
Type headers
--------------------------+------------------------------
Reporter: kkmuffme | Owner: (none)
Type: defect (bug) | Status: new
Priority: normal | Milestone: Awaiting Review
Component: I18N | Version:
Severity: normal | Resolution:
Keywords: | Focuses:
--------------------------+------------------------------
Comment (by kkmuffme):
Replying to [comment:5 siliconforks]:
> Replying to [comment:2 kkmuffme]:
> > If no Content-Encoding header is specified, it should be treated as
ANSI. Since ANSI does not support multibyte characters, this means those
should be removed.
>
> This seems like a bad idea to me, since silently deleting
(non-)characters often leads to security vulnerabilities:
>
>
https://www.unicode.org/reports/tr36/tr36-15.html#Deletion_of_Noncharacters
Good point! What do you suggest?
The current behavior allows for e.g.
https://www.unicode.org/reports/tr36/tr36-15.html#Security_Levels_and_Alerts
For .mo files - for obvious performance reasons - all sanitizing,...
happens upon creation, so that the read/access process is as fast as
possible (hash tables,...). This is why, when reading, one should adhere
stricter to the standard.
Which is also e.g. https://www.gnu.org/software/gettext/manual/html_node
/msgfmt-Invocation.html
>By default, messages are converted to UTF-8 encoding before being stored
in a MO file; this helps avoiding conversions at run time, since nowadays
most locales use the UTF-8 encoding.
--
Ticket URL: <https://core.trac.wordpress.org/ticket/63974#comment:7>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list