[wp-trac] [WordPress Trac] #38044: Make seems_utf8() RFC 3629 compliant.

WordPress Trac noreply at wordpress.org
Tue Jul 29 11:16:03 UTC 2025


#38044: Make seems_utf8() RFC 3629 compliant.
--------------------------+-----------------------------
 Reporter:  gitlost       |       Owner:  (none)
     Type:  defect (bug)  |      Status:  new
 Priority:  normal        |   Milestone:  Future Release
Component:  Formatting    |     Version:  1.2.1
 Severity:  normal        |  Resolution:
 Keywords:  has-patch     |     Focuses:
--------------------------+-----------------------------

Comment (by jonsurrell):

 Providing a specification compliant function with clear expectations is
 valuable. `wp_is_valid_utf8()` seems like a nice addition to WordPress.

 I appreciate the research and background @dmsnell provided about
 `seems_utf8()`. What are the expectations of
 [https://developer.wordpress.org/reference/functions/seems_utf8/
 seems_utf8] in WordPress today?

 > Checks to see if a string is utf8 encoded.
 > NOTE: This function checks for 5-Byte sequences, UTF8 has Bytes
 Sequences with a maximum length of 4.

 Accepting invalid UTF-8 "longer than 4 byte" sequences
 [https://core.trac.wordpress.org/browser/tags/6.8.2/src/wp-
 includes/formatting.php#L874 is a documented part behavior as of latest
 6.8.2] and seems to have [https://core.trac.wordpress.org/browser/tags/2.8
 /wp-includes/formatting.php#L184 first been documented in 2.8.0].

 I agree that deprecating `seems_utf8` and introducing the new function
 that does check for valid UTF-8 is the best approach here.

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/38044#comment:9>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list