[wp-trac] [WordPress Trac] #63974: .mo file loaded as UTF-8 by default - non-standard and ignoring Content-Type headers

Mon Sep 15 15:09:18 UTC 2025

#63974: .mo file loaded as UTF-8 by default - non-standard and ignoring Content-
Type headers
--------------------------+------------------------------
 Reporter:  kkmuffme      |       Owner:  (none)
     Type:  defect (bug)  |      Status:  new
 Priority:  normal        |   Milestone:  Awaiting Review
Component:  I18N          |     Version:
 Severity:  normal        |  Resolution:
 Keywords:                |     Focuses:
--------------------------+------------------------------

Comment (by dmsnell):

 If we want to avoid all security issues we can use the upcoming
 `mb_scrub_utf8()` being prepared in #63863

 > Good point! What do you suggest?

 If it validates as UTF-8 it’s probably UTF-8, even if it reports another
 encoding (at least this has been the case for the top 300,000 domains I
 scanned on the Internet).

 If it contains no header, we can call `mb_scrub_utf8()`. If it contains a
 header and we can understand the encoding //and// validate it, then we can
 convert.

 Otherwise a `_doing_it_wrong()` sounds great, and we can `mb_scrub_utf8()`
 the data to ensure it doesn’t introduce any invalid or malicious content.

 ----

 It may be helpful to avoid the term “ANSI” in context of text encoding.
 Most common encodings are US-ASCII compatible (bytes 0x00–0x7F all mean
 the same thing) but all of the upper range (bytes 0x80–0xFF) are mutually
 exclusive between the family of encoding commonly-referred to as “ANSI”

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/63974#comment:8>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform