[wp-trac] [WordPress Trac] #63863: Standardize UTF-8 handling and fallbacks in 6.9
WordPress Trac
noreply at wordpress.org
Thu Oct 16 20:58:46 UTC 2025
#63863: Standardize UTF-8 handling and fallbacks in 6.9
--------------------------------------+---------------------
Reporter: dmsnell | Owner: (none)
Type: enhancement | Status: new
Priority: normal | Milestone: 6.9
Component: Charset | Version: trunk
Severity: normal | Resolution:
Keywords: has-patch has-unit-tests | Focuses:
--------------------------------------+---------------------
Comment (by dmsnell):
In [changeset:"60949" 60949]:
{{{
#!CommitTicketReference repository="" revision="60949"
Charset: Rely on new UTF-8 pipeline for mb_strlen() fallback.
The existing polyfill for `mb_strlen()` contains a number of issues
leaving plenty of opportunity for improvement. Specifically, the following
are all deficiencies: it relies on Unicode PCRE support, assumes input
strings are valid UTF-8, splits input strings into an array of character
to count them (1,000 at a time, iterating until complete), and entirely
gives up when the Unicode support is missing.
This patch provides an updated polyfill which will reliably count code
points in a UTF-8 string, even in the presence of sequences of invalid
bytes. It scans through the input with zero allocations. Additionally, the
underlying fallback extends the behavior of `mb_strlen()` to provide
character counts for substrings within a larger input without extracting
the substring (it can counts characters within a byte offset and length of
a larger string).
This change improves the reliability of UTF-8 string length calculations
and removes behavioral variability based on the runtime system.
Developed in https://github.com/WordPress/wordpress-develop/pull/9828
Discussed in https://core.trac.wordpress.org/ticket/63863
See #63863.
}}}
--
Ticket URL: <https://core.trac.wordpress.org/ticket/63863#comment:34>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list