[wp-trac] [WordPress Trac] #38044: Make seems_utf8() RFC 3629 compliant.
WordPress Trac
noreply at wordpress.org
Thu Jul 24 05:32:22 UTC 2025
#38044: Make seems_utf8() RFC 3629 compliant.
--------------------------+-----------------------------
Reporter: gitlost | Owner: (none)
Type: defect (bug) | Status: new
Priority: normal | Milestone: Future Release
Component: Formatting | Version: 1.2.1
Severity: normal | Resolution:
Keywords: has-patch | Focuses:
--------------------------+-----------------------------
Comment (by dmsnell):
@gitlost I have proposed a different version of this in the linked PR. I’m
struggling with `seems_utf8()` trying to understand its purpose and roll
and //name//. So in response I think it would be clearer if we head
towards deprecating it entirely and add a new and clear function, one
which `wp_check_invalid_utf8()` can even come to rely on.
Therefore I propose we add `wp_is_valid_utf8()`. It does nothing other
than return a boolean validation, which means it’s free to focus solely on
validation and avoid questions arising around what to do with invalid
sequences the way `wp_check_invalid_utf8()` does.
For code that wants to use `seems_utf8()` it can still rely on the
confusing code (although five-byte sequences were once discussed and
envisioned, I believe no UTF-8 //ever// allowed them) but we can eliminate
calls in Core to that function and deprecate it.
What do you think? In my PR I’ve used another algorithm which I have
tested to be the fastest of the pure PHP-userspace code that I’ve been
able to find. You can see some numbers I collected in
[https://github.com/WordPress/wordpress-develop/pull/9307 my other PR] if
you want to compare some different approaches.
--
Ticket URL: <https://core.trac.wordpress.org/ticket/38044#comment:5>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list