[wp-trac] [WordPress Trac] #63974: .mo file loaded as UTF-8 by default - non-standard and ignoring Content-Type headers
WordPress Trac
noreply at wordpress.org
Mon Sep 15 13:49:31 UTC 2025
#63974: .mo file loaded as UTF-8 by default - non-standard and ignoring Content-
Type headers
--------------------------+------------------------------
Reporter: kkmuffme | Owner: (none)
Type: defect (bug) | Status: new
Priority: normal | Milestone: Awaiting Review
Component: I18N | Version:
Severity: normal | Resolution:
Keywords: | Focuses:
--------------------------+------------------------------
Comment (by kkmuffme):
>Mind sharing a link to the spec where the content content header is
discussed?
e.g. https://www.gnu.org/software/gettext/manual/html_node/MO-Files.html
>The character encoding of the strings can be any standard ASCII-
compatible encoding, such as UTF-8, ISO-8859-1, EUC-JP, etc., **as long as
the encoding’s name is stated in the header entry** (see Filling in the
Header Entry)
This is not specific to .mo files though, but in general text/plain
content files by default always have ANSI encoding unless otherwise
specified
>Are you talking about finishing user generated files? Because many are
read from the file system which contain no headers.
What do you mean?
>That class doesn’t read in UTF-8 so much as it reads in bytes without
performing any kind of character decoding. What are you suggesting the
input bytes should be if not UTF-8? How should one represent an inherently
multibyte character such as one of the thousands of CJK characters?
If no Content-Encoding header is specified, it should be treated as ANSI.
Since ANSI does not support multibyte characters, this means those should
be removed. This is how msgunfmt handles it.
Since there can be all kinds of encodings, I'd suggest to essentially
limit support to UTF-8 (which is what 99.99% is anyway) and treat
everything else as ANSI (possibly with a doing_it_wrong if there is a
Content-Type header specified, which WP doesn't support)
>Also, I’m not sure if you were meaning to write non-UTF-8 in your
examples
No, I meant those literally as UTF-8. Just copy/paste them for repro.
--
Ticket URL: <https://core.trac.wordpress.org/ticket/63974#comment:2>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list