[wp-trac] [WordPress Trac] #55117: Possible 5.9 Bug: Unknown character ( or %ef%bf%bc ) on content title

WordPress Trac noreply at wordpress.org
Fri Jul 1 14:02:12 UTC 2022


#55117: Possible 5.9 Bug: Unknown character ( or %ef%bf%bc ) on content title
-------------------------------------------------+-------------------------
 Reporter:  cantuaria                            |       Owner:  audrasjb
     Type:  defect (bug)                         |      Status:  assigned
 Priority:  normal                               |   Milestone:  6.1
Component:  Permalinks                           |     Version:  5.9
 Severity:  normal                               |  Resolution:
 Keywords:  needs-patch has-testing-info has-    |     Focuses:
  screenshots                                    |
-------------------------------------------------+-------------------------

Comment (by dmsnell):

 Thanks for the detailed reproducibility steps @ironprogrammer.
 Unfortunately I think we need to track a different sequence of steps
 because there's a difference between intentionally entering the object-
 replacement character and the object-replacement character unexpectedly
 appearing in a post title, which I believe is the real problem tracked in
 this issue (but maybe I'm wrong).

 So for all involved I think there's a conflation of a few different issues
 here:
  - Non-ASCII characters in a slug/URL are percent-encoded. This is
 standard practice and "necessary" if we want to represent text people
 enter. If my post is named "Bücher" the appropriate URL is "B%C3%BCcher".
 There's another practice we don't use but could, which I think deserves
 its own Trac ticket and eventually I would love to see us use - Punycode,
 where the same "Bücher" slug would become "xn--bcher-kva" but in the
 browser URL bar would appear at "Bücher".
  - `[OBJ]` characters which are stored in the database are rendered on
 page view. This is probably suspect enough that we should strip them out,
 at least for the post title. It's debatable whether this is a problem with
 WordPress or not because technically we could argue that if it's there in
 the data it should be displayed (at least it has `print=yes` in its
 Unicode properties).
  - The `[OBJ]` character is appearing unintentionally in post titles which
 generates the slugs which stand out because of the percent-encoding.

 I'd like to address the third point in
 [https://github.com/WordPress/gutenberg/issues/38637 #38637] if we can
 since it's a Gutenberg bug. The first two are decisions more for Core and
 maybe more appropriate for Trac. On that point I'm going to update that
 issue with some findings that I found while working with @ironprogrammer
 yesterday.

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/55117#comment:26>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list