[wp-trac] [WordPress Trac] #63863: Standardize UTF-8 handling and fallbacks in 6.9

WordPress Trac noreply at wordpress.org
Sat Nov 1 22:39:19 UTC 2025


#63863: Standardize UTF-8 handling and fallbacks in 6.9
--------------------------------------+--------------------------
 Reporter:  dmsnell                   |       Owner:  dmsnell
     Type:  enhancement               |      Status:  closed
 Priority:  normal                    |   Milestone:  6.9
Component:  Charset                   |     Version:  trunk
 Severity:  normal                    |  Resolution:  fixed
 Keywords:  has-patch has-unit-tests  |     Focuses:  performance
--------------------------------------+--------------------------
Changes (by dmsnell):

 * status:  reopened => closed
 * resolution:   => fixed


Comment:

 After working with @zieladam on this issue I don’t believe there’s any
 cause for concern here, given that he was using this pipeline in a
 degenerate case it’s not designed for: scanning through a string one code
 point at a time.

 This bypassed a substantial speedup, which was quickly scanning past ASCII
 bytes, causing the slowdown to be due mostly to PHP’s function-calling
 overhead. By reworking the code in the PHP toolkit so that it retains the
 ASCII fast-path the performance issues seem to have disappeared.

 The earlier `utf8_codepoint_at()` function was a previous incarnation of
 the UTF-8 decoder which turned out to be slower than what was merged in
 `_wp_scan_utf8()` for almost every single case other than scanning one
 code point at a time.

 The performance characteristics for both are complicated and significantly
 worse for multi-byte characters than they are for ASCII, but the merged
 code is significantly faster when dealing with US-ASCII spans of text.

 @zieladam can correct me if I have any details wrong here, and
 @westonruter feel free to re-open again, but I closed this because I think
 it was a bit of a false alarm based on (a) the testing I performed against
 realistic inputs and usage for Core and (b) the extreme edge case usage in
 the first version of its introduction to the PHP toolkit (Update after our
 discussion in [https://github.com/WordPress/php-toolkit/pull/201 wordpress
 /php-toolkit#201])

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/63863#comment:53>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list