[wp-trac] [WordPress Trac] #22951: Performance enhancements for esc_url()

WordPress Trac noreply at wordpress.org
Thu Nov 10 21:27:53 UTC 2022


#22951: Performance enhancements for esc_url()
-------------------------------------------------+-------------------------
 Reporter:  markjaquith                          |       Owner:  schlessera
     Type:  defect (bug)                         |      Status:  assigned
 Priority:  normal                               |   Milestone:  6.2
Component:  Formatting                           |     Version:  2.8
 Severity:  normal                               |  Resolution:
 Keywords:  has-patch 2nd-opinion dev-feedback   |     Focuses:
  needs-testing                                  |  performance
-------------------------------------------------+-------------------------

Comment (by azaozz):

 Looking at the patches/PR, may be missing something but why the big regex?

 The idea here is to short-circuit `wp_kses_bad_protocol()` when that
 function won't have anything to do. The purpose of this function is to
 strip "bad" protocols from the beginning of a string (presumably an URL).
 Seems it is mostly used to prevent things like `javascript://alert(123)`
 when the `javascript://` protocol is not allowed. (Frankly I think
 rejecting a URL if it starts with a bad protocol would be better, but
 that's outside the scope here.)

 Also `wp_kses_bad_protocol_once()` seems to be converting some HTML
 entities in the whole string. This is to better match the `:` char. See:
 {{{#!php
 $string  = preg_replace( '/(&#0*58(?![;0-9])|&#x0*3a(?![;a-f0-9]))/i',
 '$1;', $string );
 $string2 = preg_split( '/:|&#0*58;|&#x0*3a;|:/i', $string, 2 );
 }}}

 The proposed regex seem to be validating the whole URL (which is not
 intended here), and seems it may be missing the HTML entities. In addition
 thinking there was a problem with null characters that can "throw off"
 some regex (or some browsers?), so `$string = wp_kses_no_null( $string );`
 should be at the top in all cases.

 If we go with the original idea, the patch should probably be doing
 something like:
 1. Remove null chars.
 2. Check if `https://` is allowed in `$allowed_protocols`. If yes, check
 if the string starts with `https://` case-insensitive. If yes, short-
 circuit.
 3. Check if `http://` is allowed in `$allowed_protocols`. If yes, check if
 the string starts with `http://` case-insensitive. If yes, short-circuit.

 This will catch the most common use cases and will probably speed up
 `wp_kses_bad_protocol()` quite a bit. Don't think it needs to use regex at
 all. Seems `stripos( $string, 'https://` ) === 0` would be enough, and a
 lot faster.

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/22951#comment:28>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list