[wp-trac] [WordPress Trac] #63863: Standardize UTF-8 handling and fallbacks in 6.9

WordPress Trac noreply at wordpress.org
Sat Oct 18 04:34:18 UTC 2025


#63863: Standardize UTF-8 handling and fallbacks in 6.9
--------------------------------------+---------------------
 Reporter:  dmsnell                   |       Owner:  (none)
     Type:  enhancement               |      Status:  new
 Priority:  normal                    |   Milestone:  6.9
Component:  Charset                   |     Version:  trunk
 Severity:  normal                    |  Resolution:
 Keywords:  has-patch has-unit-tests  |     Focuses:
--------------------------------------+---------------------

Comment (by dmsnell):

 In [changeset:"60969" 60969]:
 {{{
 #!CommitTicketReference repository="" revision="60969"
 Charset: Rely on new UTF-8 pipeline for mb_substr() fallback.

 The existing polyfill for `mb_substr()` contains a number of issues
 leaving plenty of opportunity for improvement. Specifically, the following
 are all deficiencies: it relies on Unicode PCRE support, assumes input
 strings are valid UTF-8, splits input strings into an array of characters
 (1,000 at a time, iterating until complete), and re-joins them at the end.

 This patch provides an updated polyfill which will reliably parse UTF-8
 strings even in the presence of invalid bytes. It computes boundaries for
 the substring extraction with zero allocations and then returns a single
 `substr()` call at the end.

 This change improves the reliability of UTF-8 string handling and removes
 behavioral variability based on the runtime system.

 Developed in https://github.com/WordPress/wordpress-develop/pull/9829
 Discussed in https://core.trac.wordpress.org/ticket/63863

 See #63863.
 }}}

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/63863#comment:38>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list