[wp-trac] [WordPress Trac] #63863: Standardize UTF-8 handling and fallbacks in 6.9
WordPress Trac
noreply at wordpress.org
Sat Nov 1 22:39:19 UTC 2025
#63863: Standardize UTF-8 handling and fallbacks in 6.9
--------------------------------------+--------------------------
Reporter: dmsnell | Owner: dmsnell
Type: enhancement | Status: closed
Priority: normal | Milestone: 6.9
Component: Charset | Version: trunk
Severity: normal | Resolution: fixed
Keywords: has-patch has-unit-tests | Focuses: performance
--------------------------------------+--------------------------
Changes (by dmsnell):
* status: reopened => closed
* resolution: => fixed
Comment:
After working with @zieladam on this issue I don’t believe there’s any
cause for concern here, given that he was using this pipeline in a
degenerate case it’s not designed for: scanning through a string one code
point at a time.
This bypassed a substantial speedup, which was quickly scanning past ASCII
bytes, causing the slowdown to be due mostly to PHP’s function-calling
overhead. By reworking the code in the PHP toolkit so that it retains the
ASCII fast-path the performance issues seem to have disappeared.
The earlier `utf8_codepoint_at()` function was a previous incarnation of
the UTF-8 decoder which turned out to be slower than what was merged in
`_wp_scan_utf8()` for almost every single case other than scanning one
code point at a time.
The performance characteristics for both are complicated and significantly
worse for multi-byte characters than they are for ASCII, but the merged
code is significantly faster when dealing with US-ASCII spans of text.
@zieladam can correct me if I have any details wrong here, and
@westonruter feel free to re-open again, but I closed this because I think
it was a bit of a false alarm based on (a) the testing I performed against
realistic inputs and usage for Core and (b) the extreme edge case usage in
the first version of its introduction to the PHP toolkit (Update after our
discussion in [https://github.com/WordPress/php-toolkit/pull/201 wordpress
/php-toolkit#201])
--
Ticket URL: <https://core.trac.wordpress.org/ticket/63863#comment:53>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list