[wp-trac] [WordPress Trac] #64177: Command Palette: Encoded ampersands in URLs
WordPress Trac
noreply at wordpress.org
Tue Nov 4 12:00:20 UTC 2025
#64177: Command Palette: Encoded ampersands in URLs
--------------------------+------------------------
Reporter: swissspidy | Owner: wildworks
Type: defect (bug) | Status: closed
Priority: normal | Milestone: 6.9
Component: General | Version: trunk
Severity: normal | Resolution: fixed
Keywords: has-patch | Focuses:
--------------------------+------------------------
Comment (by dmsnell):
It would be worth revisiting this as I think there are a few more gaps we
could close.
It looks like this is potentially coming in from the `esc_url()` call in
`menu_page_url()` when there’s a parent slug pointing to a defined parent
page. In the other condition WordPress calls `add_query_arg()` which in
turn calls `urlencode_deep()` which //should// be replacing the `&` with
`%26`.
If that’s the case it would be preferable, I think to ensure that we turn
the plaintext `$menu_slug` into its URL-escaped variety //before// calling
`esc_url()` on it.
----
There’s another point in the patch that’s easy to overlook: we should be
careful about calling `html_entity_decode()` and also about passing
`get_bloginfo( 'charset' )`. First of all, we have
`WP_HTML_Decoder::decode_attribute()` which will more accurately decode
values that are //from// or encoded //for// an HTML attribute. Second,
that function in combination with the `blog_charset` is generally unsafe.
For stronger interoperability and to avoid security issues,
[https://url.spec.whatwg.org/#percent-encoded-bytes we should only UTF-8
values] when percent-encoding. This is complicated because even //if// we
expect to print out something like an `ISO-8859-1` page, we’re not
printing the bytes here but rather the URL-encoding of a query arg, which
will always be ASCII.
What can happen is that //if we decode into a non-UTF-8 locale// then we
can generate invalid UTF-8 in the percent-encoding and that leads to ill-
defined behaviors and can lead to exploits.
{{{#!php
<?php
$iso_8859_1 = html_entity_decode( '©©', ENT_QUOTES, 'ISO-8859-1'
);
// contains the bytes 0xA9 0xA9, which are invalid UTF-8
echo urlencode( $iso_8859_1 );
// %A9%A9
$utf_8 = html_entity_decode( '©©', ENT_QUOTES, 'UTF-8' );
// contains the bytes 0xC2 0xA9 0xC2 0xA9
echo urlencode( $utf_8 );
// %C2%A9%C2%A9
echo urlencode( WP_HTML_Decoder::decode_attribute( '©©' ) );
// %C2%A9%C2%A9
}}}
More than likely, if we use `blog_charset` we will present links that are
also broken if the site is using anything for `blog_charset` other than
UTF-8. `WP_HTML_Decoder::decode_attribute()` is also spec-compliant and so
will produce an equivalent match to how the browser decodes the value,
whereas `html_entity_decode()` is not able to do that reliably.
--
Ticket URL: <https://core.trac.wordpress.org/ticket/64177#comment:10>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list