[wp-trac] [WordPress Trac] #63974: .mo file loaded as UTF-8 by default - non-standard and ignoring Content-Type headers

WordPress Trac noreply at wordpress.org
Mon Sep 15 13:49:31 UTC 2025


#63974: .mo file loaded as UTF-8 by default - non-standard and ignoring Content-
Type headers
--------------------------+------------------------------
 Reporter:  kkmuffme      |       Owner:  (none)
     Type:  defect (bug)  |      Status:  new
 Priority:  normal        |   Milestone:  Awaiting Review
Component:  I18N          |     Version:
 Severity:  normal        |  Resolution:
 Keywords:                |     Focuses:
--------------------------+------------------------------

Comment (by kkmuffme):

 >Mind sharing a link to the spec where the content content header is
 discussed?

 e.g. https://www.gnu.org/software/gettext/manual/html_node/MO-Files.html

 >The character encoding of the strings can be any standard ASCII-
 compatible encoding, such as UTF-8, ISO-8859-1, EUC-JP, etc., **as long as
 the encoding’s name is stated in the header entry** (see Filling in the
 Header Entry)

 This is not specific to .mo files though, but in general text/plain
 content files by default always have ANSI encoding unless otherwise
 specified

 >Are you talking about finishing user generated files? Because many are
 read from the file system which contain no headers.

 What do you mean?

 >That class doesn’t read in UTF-8 so much as it reads in bytes without
 performing any kind of character decoding. What are you suggesting the
 input bytes should be if not UTF-8? How should one represent an inherently
 multibyte character such as one of the thousands of CJK characters?

 If no Content-Encoding header is specified, it should be treated as ANSI.
 Since ANSI does not support multibyte characters, this means those should
 be removed. This is how msgunfmt handles it.

 Since there can be all kinds of encodings, I'd suggest to essentially
 limit support to UTF-8 (which is what 99.99% is anyway) and treat
 everything else as ANSI (possibly with a doing_it_wrong if there is a
 Content-Type header specified, which WP doesn't support)

 >Also, I’m not sure if you were meaning to write non-UTF-8 in your
 examples

 No, I meant those literally as UTF-8. Just copy/paste them for repro.

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/63974#comment:2>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list