[wp-trac] [WordPress Trac] #64177: Command Palette: Encoded ampersands in URLs

Tue Nov 4 12:00:20 UTC 2025

#64177: Command Palette: Encoded ampersands in URLs
--------------------------+------------------------
 Reporter:  swissspidy    |       Owner:  wildworks
     Type:  defect (bug)  |      Status:  closed
 Priority:  normal        |   Milestone:  6.9
Component:  General       |     Version:  trunk
 Severity:  normal        |  Resolution:  fixed
 Keywords:  has-patch     |     Focuses:
--------------------------+------------------------

Comment (by dmsnell):

 It would be worth revisiting this as I think there are a few more gaps we
 could close.

 It looks like this is potentially coming in from the `esc_url()` call in
 `menu_page_url()` when there’s a parent slug pointing to a defined parent
 page. In the other condition WordPress calls `add_query_arg()` which in
 turn calls `urlencode_deep()` which //should// be replacing the `&` with
 `%26`.

 If that’s the case it would be preferable, I think to ensure that we turn
 the plaintext `$menu_slug` into its URL-escaped variety //before// calling
 `esc_url()` on it.

 ----

 There’s another point in the patch that’s easy to overlook: we should be
 careful about calling `html_entity_decode()` and also about passing
 `get_bloginfo( 'charset' )`. First of all, we have
 `WP_HTML_Decoder::decode_attribute()` which will more accurately decode
 values that are //from// or encoded //for// an HTML attribute. Second,
 that function in combination with the `blog_charset` is generally unsafe.

 For stronger interoperability and to avoid security issues,
 [https://url.spec.whatwg.org/#percent-encoded-bytes we should only UTF-8
 values] when percent-encoding. This is complicated because even //if// we
 expect to print out something like an `ISO-8859-1` page, we’re not
 printing the bytes here but rather the URL-encoding of a query arg, which
 will always be ASCII.

 What can happen is that //if we decode into a non-UTF-8 locale// then we
 can generate invalid UTF-8 in the percent-encoding and that leads to ill-
 defined behaviors and can lead to exploits.

 {{{#!php
 <?php
 $iso_8859_1 = html_entity_decode( '©&#xa9;', ENT_QUOTES, 'ISO-8859-1'
 );
 // contains the bytes 0xA9 0xA9, which are invalid UTF-8
 echo urlencode( $iso_8859_1 );
 // %A9%A9

 $utf_8 = html_entity_decode( '©&#xa9;', ENT_QUOTES, 'UTF-8' );
 // contains the bytes 0xC2 0xA9 0xC2 0xA9
 echo urlencode( $utf_8 );
 // %C2%A9%C2%A9

 echo urlencode( WP_HTML_Decoder::decode_attribute( '©&#xa9;' ) );
 // %C2%A9%C2%A9
 }}}

 More than likely, if we use `blog_charset` we will present links that are
 also broken if the site is using anything for `blog_charset` other than
 UTF-8. `WP_HTML_Decoder::decode_attribute()` is also spec-compliant and so
 will produce an equivalent match to how the browser decodes the value,
 whereas `html_entity_decode()` is not able to do that reliably.

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/64177#comment:10>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform