[wp-trac] [WordPress Trac] #27733: wpautop(): \s in regex destroys some UTF-8 characters
WordPress Trac
noreply at wordpress.org
Mon Sep 22 22:03:11 UTC 2025
#27733: wpautop(): \s in regex destroys some UTF-8 characters
--------------------------------------------------+----------------------
Reporter: tenpura | Owner: (none)
Type: defect (bug) | Status: closed
Priority: normal | Milestone:
Component: Formatting | Version: 0.71
Severity: major | Resolution: wontfix
Keywords: needs-patch needs-unit-tests wpautop | Focuses:
--------------------------------------------------+----------------------
Changes (by dmsnell):
* status: new => closed
* resolution: => wontfix
Comment:
the problem here is probably not really that we have `\s` but rather that
we’re mixing encodings, right?
on a system whose internal encoding is something like `latin1` we may get
U+00A0 encoded as 0xA0, which is what the PCRE pattern will incorporate as
a no-break space. Adding an arbitrarily limited set of space characters
//appears// to resolve this problem because that particular offending byte
is no longer caught, but there are a thousand other places different bytes
will trip up.
on systems with UTF-8 as their internal encoding, however ,the no-break
space will be encoded as 0xC2 0xA0 and the PCRE pattern will look for
that. it won’t mangle the `ム`.
given that UTF-8 is the default internal encoding in PHP and has been for
years I’m inclined to close this as it shouldn’t practically be an issue
any more. if we wanted to resolve it fully we’d have to check every place
we call string-related functions for which encoding is going in and which
is set as the default. this is an unfeasible task.
for that I think it would fall nicely as a duplicate of #62172. if we
acknowledge that UTF-8 is the only actual supported encoding, this bug
cannot appear. it’s really the obligation of whoever is integrating the
database, server, PHP code, and plugins to ensure proper harmony between
various text encodings.
going to mark as `wontfix` for now. feel free to re-open if you disagree.
--
Ticket URL: <https://core.trac.wordpress.org/ticket/27733#comment:13>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list