[wp-trac] [WordPress Trac] #22951: Performance enhancements for esc_url()
WordPress Trac
noreply at wordpress.org
Thu Nov 10 21:27:53 UTC 2022
#22951: Performance enhancements for esc_url()
-------------------------------------------------+-------------------------
Reporter: markjaquith | Owner: schlessera
Type: defect (bug) | Status: assigned
Priority: normal | Milestone: 6.2
Component: Formatting | Version: 2.8
Severity: normal | Resolution:
Keywords: has-patch 2nd-opinion dev-feedback | Focuses:
needs-testing | performance
-------------------------------------------------+-------------------------
Comment (by azaozz):
Looking at the patches/PR, may be missing something but why the big regex?
The idea here is to short-circuit `wp_kses_bad_protocol()` when that
function won't have anything to do. The purpose of this function is to
strip "bad" protocols from the beginning of a string (presumably an URL).
Seems it is mostly used to prevent things like `javascript://alert(123)`
when the `javascript://` protocol is not allowed. (Frankly I think
rejecting a URL if it starts with a bad protocol would be better, but
that's outside the scope here.)
Also `wp_kses_bad_protocol_once()` seems to be converting some HTML
entities in the whole string. This is to better match the `:` char. See:
{{{#!php
$string = preg_replace( '/(�*58(?![;0-9])|�*3a(?![;a-f0-9]))/i',
'$1;', $string );
$string2 = preg_split( '/:|�*58;|�*3a;|:/i', $string, 2 );
}}}
The proposed regex seem to be validating the whole URL (which is not
intended here), and seems it may be missing the HTML entities. In addition
thinking there was a problem with null characters that can "throw off"
some regex (or some browsers?), so `$string = wp_kses_no_null( $string );`
should be at the top in all cases.
If we go with the original idea, the patch should probably be doing
something like:
1. Remove null chars.
2. Check if `https://` is allowed in `$allowed_protocols`. If yes, check
if the string starts with `https://` case-insensitive. If yes, short-
circuit.
3. Check if `http://` is allowed in `$allowed_protocols`. If yes, check if
the string starts with `http://` case-insensitive. If yes, short-circuit.
This will catch the most common use cases and will probably speed up
`wp_kses_bad_protocol()` quite a bit. Don't think it needs to use regex at
all. Seems `stripos( $string, 'https://` ) === 0` would be enough, and a
lot faster.
--
Ticket URL: <https://core.trac.wordpress.org/ticket/22951#comment:28>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list