[wp-trac] [WordPress Trac] #63837: Update wp_check_invalid_utf8()
WordPress Trac
noreply at wordpress.org
Tue Aug 19 12:23:03 UTC 2025
#63837: Update wp_check_invalid_utf8()
--------------------------------------+---------------------
Reporter: dmsnell | Owner: (none)
Type: enhancement | Status: new
Priority: normal | Milestone: 6.9
Component: Formatting | Version: trunk
Severity: normal | Resolution:
Keywords: has-patch has-unit-tests | Focuses:
--------------------------------------+---------------------
Comment (by siliconforks):
I don't know if this is something worth worrying about, but the new tests
are causing about a hundred test failures on Ubuntu 22.04 LTS.
It appears that in the version of PHP being used (8.1.2, which is very
old), `mb_scrub()` does not perform maximal subpart replacement. (It
still performs replacement, just not "maximal" replacement.) This appears
to be fixed in later 8.1.x versions.
A simple program to demonstrate the issue:
{{{#!php
<?php
echo PHP_VERSION . "\n";
// https://www.unicode.org/versions/Unicode16.0.0/core-
spec/chapter-3/#G67519
$original = "\xC0\xAF\xE0\x80\xBF\xF0\x81\x82\x41";
$scrubbed = mb_scrub( $original, 'UTF-8' );
var_dump( $scrubbed );
}}}
Running this on PHP 8.1.2 and PHP 8.1.33, this is what I get:
{{{
$ ~/php-8.1.2/bin/php mb_scrub.php
8.1.2
string(7) "??????A"
$ ~/php-8.1.33/bin/php mb_scrub.php
8.1.33
string(9) "????????A"
}}}
Again, I'm not sure if this is actually something to be concerned about -
the reason I bring it up is that 8.1.2 is Ubuntu's LTS version so it is
still being used.
--
Ticket URL: <https://core.trac.wordpress.org/ticket/63837#comment:3>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list