[wp-trac] [WordPress Trac] #38044: Make seems_utf8() RFC 3629 compliant.

WordPress Trac noreply at wordpress.org
Thu Jul 24 05:32:22 UTC 2025


#38044: Make seems_utf8() RFC 3629 compliant.
--------------------------+-----------------------------
 Reporter:  gitlost       |       Owner:  (none)
     Type:  defect (bug)  |      Status:  new
 Priority:  normal        |   Milestone:  Future Release
Component:  Formatting    |     Version:  1.2.1
 Severity:  normal        |  Resolution:
 Keywords:  has-patch     |     Focuses:
--------------------------+-----------------------------

Comment (by dmsnell):

 @gitlost I have proposed a different version of this in the linked PR. I’m
 struggling with `seems_utf8()` trying to understand its purpose and roll
 and //name//. So in response I think it would be clearer if we head
 towards deprecating it entirely and add a new and clear function, one
 which `wp_check_invalid_utf8()` can even come to rely on.

 Therefore I propose we add `wp_is_valid_utf8()`. It does nothing other
 than return a boolean validation, which means it’s free to focus solely on
 validation and avoid questions arising around what to do with invalid
 sequences the way `wp_check_invalid_utf8()` does.

 For code that wants to use `seems_utf8()` it can still rely on the
 confusing code (although five-byte sequences were once discussed and
 envisioned, I believe no UTF-8 //ever// allowed them) but we can eliminate
 calls in Core to that function and deprecate it.

 What do you think? In my PR I’ve used another algorithm which I have
 tested to be the fastest of the pure PHP-userspace code that I’ve been
 able to find. You can see some numbers I collected in
 [https://github.com/WordPress/wordpress-develop/pull/9307 my other PR] if
 you want to compare some different approaches.

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/38044#comment:5>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list