[wp-trac] [WordPress Trac] #63837: Update wp_check_invalid_utf8()

WordPress Trac noreply at wordpress.org
Tue Aug 19 12:23:03 UTC 2025


#63837: Update wp_check_invalid_utf8()
--------------------------------------+---------------------
 Reporter:  dmsnell                   |       Owner:  (none)
     Type:  enhancement               |      Status:  new
 Priority:  normal                    |   Milestone:  6.9
Component:  Formatting                |     Version:  trunk
 Severity:  normal                    |  Resolution:
 Keywords:  has-patch has-unit-tests  |     Focuses:
--------------------------------------+---------------------

Comment (by siliconforks):

 I don't know if this is something worth worrying about, but the new tests
 are causing about a hundred test failures on Ubuntu 22.04 LTS.

 It appears that in the version of PHP being used (8.1.2, which is very
 old), `mb_scrub()` does not perform maximal subpart replacement.  (It
 still performs replacement, just not "maximal" replacement.)  This appears
 to be fixed in later 8.1.x versions.

 A simple program to demonstrate the issue:

 {{{#!php
 <?php

 echo PHP_VERSION . "\n";

 // https://www.unicode.org/versions/Unicode16.0.0/core-
 spec/chapter-3/#G67519
 $original = "\xC0\xAF\xE0\x80\xBF\xF0\x81\x82\x41";

 $scrubbed = mb_scrub( $original, 'UTF-8' );
 var_dump( $scrubbed );
 }}}

 Running this on PHP 8.1.2 and PHP 8.1.33, this is what I get:

 {{{
 $ ~/php-8.1.2/bin/php mb_scrub.php
 8.1.2
 string(7) "??????A"
 $ ~/php-8.1.33/bin/php mb_scrub.php
 8.1.33
 string(9) "????????A"
 }}}

 Again, I'm not sure if this is actually something to be concerned about -
 the reason I bring it up is that 8.1.2 is Ubuntu's LTS version so it is
 still being used.

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/63837#comment:3>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list