[wp-trac] [WordPress Trac] #27733: wpautop(): \s in regex destroys some UTF-8 characters
WordPress Trac
noreply at wordpress.org
Tue Sep 23 21:56:42 UTC 2025
#27733: wpautop(): \s in regex destroys some UTF-8 characters
--------------------------------------------------+---------------------
Reporter: tenpura | Owner: (none)
Type: defect (bug) | Status: closed
Priority: normal | Milestone:
Component: Formatting | Version: 0.71
Severity: major | Resolution: fixed
Keywords: needs-patch needs-unit-tests wpautop | Focuses:
--------------------------------------------------+---------------------
Changes (by dmsnell):
* resolution: wontfix => fixed
Comment:
Thanks for the investigation @miqrogroove
In my testing I was unable to get `1 === preg_match( '/\s/', "one\xA0two"
)` regardless of my `LC_CTYPE`, `LC_ALL`, and other charset-related ENV
values or `php.ini` settings. I think we are in agreement that the PCRE
functions simply don’t do anything special.
(I //was// able to get it to match when adding the `u` flag as long as I
updated the bytes to `\xC2\xA0` for the proper UTF-8 encoding of the `NO-
BREAK SPACE`).
It would be nice to know for sure that they are operating simply on bytes
(regardless of encoding) //or// on UTF-8 if provided the UTF-8 flag.
----
This makes me think that WordPress in all of its supported environments
will not and cannot create this scenario. Does that sound right? If so, I
believe that `wontfix` is still fine, or maybe `invalid` would be more
appropriate.
--
Ticket URL: <https://core.trac.wordpress.org/ticket/27733#comment:17>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list