[wp-trac] [WordPress Trac] #38044: Make seems_utf8() RFC 3629 compliant.
WordPress Trac
noreply at wordpress.org
Tue Jul 29 11:16:03 UTC 2025
#38044: Make seems_utf8() RFC 3629 compliant.
--------------------------+-----------------------------
Reporter: gitlost | Owner: (none)
Type: defect (bug) | Status: new
Priority: normal | Milestone: Future Release
Component: Formatting | Version: 1.2.1
Severity: normal | Resolution:
Keywords: has-patch | Focuses:
--------------------------+-----------------------------
Comment (by jonsurrell):
Providing a specification compliant function with clear expectations is
valuable. `wp_is_valid_utf8()` seems like a nice addition to WordPress.
I appreciate the research and background @dmsnell provided about
`seems_utf8()`. What are the expectations of
[https://developer.wordpress.org/reference/functions/seems_utf8/
seems_utf8] in WordPress today?
> Checks to see if a string is utf8 encoded.
> NOTE: This function checks for 5-Byte sequences, UTF8 has Bytes
Sequences with a maximum length of 4.
Accepting invalid UTF-8 "longer than 4 byte" sequences
[https://core.trac.wordpress.org/browser/tags/6.8.2/src/wp-
includes/formatting.php#L874 is a documented part behavior as of latest
6.8.2] and seems to have [https://core.trac.wordpress.org/browser/tags/2.8
/wp-includes/formatting.php#L184 first been documented in 2.8.0].
I agree that deprecating `seems_utf8` and introducing the new function
that does check for valid UTF-8 is the best approach here.
--
Ticket URL: <https://core.trac.wordpress.org/ticket/38044#comment:9>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list