[wp-trac] [WordPress Trac] #56531: Aiming to “kill” entities, `sanitize_title_with_dashes()` happens to eat content

WordPress Trac noreply at wordpress.org
Thu Sep 8 00:01:54 UTC 2022


#56531: Aiming to “kill” entities, `sanitize_title_with_dashes()` happens to eat
content
-------------------------+-------------------------------------------------
 Reporter:  anrghg       |      Owner:  (none)
     Type:  defect       |     Status:  new
  (bug)                  |
 Priority:  normal       |  Milestone:  Awaiting Review
Component:  Formatting   |    Version:
 Severity:  major        |   Keywords:  needs-dev-note needs-patch changes-
  Focuses:               |  requested
-------------------------+-------------------------------------------------
 {{{
 $title = preg_replace( '/&.+?;/', '', $title );
 }}}

 This regex deletes the part of the title between an ampersand and a
 semicolon. I ran into this issue when testing ASCII symbols and
 punctuation but it may affect titles using a semicolon instead of an inner
 period, and happen to use an ampersand before. Semicolon is less common,
 even less in titles, but WordPress should not make assumptions nor impose
 limitations.

 In the process, `sanitize_title_with_dashes()` does again (cf. #56530)
 part of the job of `remove_accents()` but badly, deleting `é`
 instead of replacing it with `e`, and so on.

 I’d suggest decoding all HTML entities, then reencoding `<`, `>`, `&`:
 {{{#!php
 <?php
 $title = html_entity_decode( $title );
 $title = preg_replace( array( '/</', '/>/', '/&/' ), array( '<',
 '>', '&' ), $title );
 }}}

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/56531>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list