[wp-trac] [WordPress Trac] #65342: Charset: Polyfill mb_ord() and mb_chr() for UTF-8

WordPress Trac noreply at wordpress.org
Thu May 28 06:21:28 UTC 2026


#65342: Charset: Polyfill mb_ord() and mb_chr() for UTF-8
--------------------------------------+----------------------
 Reporter:  dmsnell                   |       Owner:  dmsnell
     Type:  enhancement               |      Status:  closed
 Priority:  normal                    |   Milestone:  7.1
Component:  Charset                   |     Version:
 Severity:  normal                    |  Resolution:  fixed
 Keywords:  has-patch has-unit-tests  |     Focuses:
--------------------------------------+----------------------

Comment (by dmsnell):

 In [changeset:"62425" 62425]:
 {{{
 #!CommitTicketReference repository="" revision="62425"
 Charset: Update antispambot to handle multibyte characters.

 In preparation for handling Unicode email addresses (non-US-ASCII
 characters in the mailbox name), the `antispambot()` function needs to
 be multi-byte aware so that it creates proper HTML numeric character
 references and percent-encoded strings.

 Previously it has been scanning the input email address byte-by-byte,
 but with multibyte characters this will produce invalid sequences of the
 transformations by encoding individual bytes of a multi-byte sequence as
 if they were whole characters on their own.

 This patch relies on the newly-polyfilled `mb_ord()` function and the
 `_wp_scan_utf8()` function to crawl through an input email by code
 point, assuming UTF-8 encoding. This ensures proper transformation.

 Developed in: https://github.com/WordPress/wordpress-develop/pull/11567
 Discussed in: https://core.trac.wordpress.org/ticket/31992

 Props agulbra, akirk, benniledl, dmsnell, siliconforks.
 See #65342.
 }}}

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/65342#comment:3>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list