[wp-trac] [WordPress Trac] #63863: Standardize UTF-8 handling and fallbacks in 6.9
WordPress Trac
noreply at wordpress.org
Sat Oct 18 04:34:18 UTC 2025
#63863: Standardize UTF-8 handling and fallbacks in 6.9
--------------------------------------+---------------------
Reporter: dmsnell | Owner: (none)
Type: enhancement | Status: new
Priority: normal | Milestone: 6.9
Component: Charset | Version: trunk
Severity: normal | Resolution:
Keywords: has-patch has-unit-tests | Focuses:
--------------------------------------+---------------------
Comment (by dmsnell):
In [changeset:"60969" 60969]:
{{{
#!CommitTicketReference repository="" revision="60969"
Charset: Rely on new UTF-8 pipeline for mb_substr() fallback.
The existing polyfill for `mb_substr()` contains a number of issues
leaving plenty of opportunity for improvement. Specifically, the following
are all deficiencies: it relies on Unicode PCRE support, assumes input
strings are valid UTF-8, splits input strings into an array of characters
(1,000 at a time, iterating until complete), and re-joins them at the end.
This patch provides an updated polyfill which will reliably parse UTF-8
strings even in the presence of invalid bytes. It computes boundaries for
the substring extraction with zero allocations and then returns a single
`substr()` call at the end.
This change improves the reliability of UTF-8 string handling and removes
behavioral variability based on the runtime system.
Developed in https://github.com/WordPress/wordpress-develop/pull/9829
Discussed in https://core.trac.wordpress.org/ticket/63863
See #63863.
}}}
--
Ticket URL: <https://core.trac.wordpress.org/ticket/63863#comment:38>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list