[wp-trac] [WordPress Trac] #64180: Simplify and improve performance in `WP_Hook::apply_filters()`

Sun Nov 2 00:24:50 UTC 2025

#64180: Simplify and improve performance in `WP_Hook::apply_filters()`
-------------------------+------------------------------
 Reporter:  grapestain   |       Owner:  (none)
     Type:  enhancement  |      Status:  new
 Priority:  normal       |   Milestone:  Awaiting Review
Component:  Plugins      |     Version:  6.8.3
 Severity:  minor        |  Resolution:
 Keywords:               |     Focuses:  performance
-------------------------+------------------------------

Comment (by grapestain):

 Oh, wow, I would have never thought of that. Thanks for the insight.

 In that case we can still achieve similar improvements just by removing
 the `if ( 0 === $the_['accepted_args'] ) {` branch. This is because as
 pointed out this branch is only triggered in about 1% of the total calls,
 but is evaluated as the first condition in all cases. Apparently the
 additional overhead added in 99% of the cases is not justified by the
 small gain achieved in 1%.

 So if we look at two more possible solutions:

 Reorder branches:
 {{{#!php
 <?php
 // Avoid the array_slice() if possible.
 if ( $the_['accepted_args'] >= $num_args ) {
         $value = call_user_func_array( $the_['function'], $args );
 }
 else if ( 0 === $the_['accepted_args'] ) {
         $value = call_user_func( $the_['function'] );
 }
 else {
         $value = call_user_func_array( $the_['function'], array_slice(
 $args, 0, $the_['accepted_args'] ) );
 }
 }}}

 Here we use the knowledge that 0 accepted arguments only happens around 1%
 of the cases whereas accepting all arguments is roughly 80% of all filter
 calls on the front end and around 75% on the admin, so checking for the
 most likely branch first saves additional checks.

 Simplified branches:
 {{{#!php
 <?php
 if ( $the_['accepted_args'] >= $num_args ) {
         $value = call_user_func_array( $the_['function'], $args );
 } else {
         $value = call_user_func_array( $the_['function'], array_slice(
 $args, 0, $the_['accepted_args'] ) );
 }
 }}}

 And here we go one step further assuming that the additional check for 0
 arguments what is still going to fail in about 95% of the remaining cases
 actually costs more than calling `array_slice` and slicing an array to an
 empty array and using `call_user_func_array()` instead of
 `call_user_func()` in ~5% of the remaining calls or ~1% of the total
 calls.

 So I did my tests again:
 (Note that the 100% values are different, because my computer is in a
 different stat since opening the ticket, so comparing to those would not
 be fair.)

 ||Apply Filter call with argument count||Current implementation||Reordered
 branches||Simplified branches||
 ||`apply_filter(...,0);`||1.984460s (100%)||2.017885s (101.68%)||1.993416s
 (100.45%)||
 ||`apply_filter(...,1);`||2.084825s (100%)||2.038089s ( 97.76%)||2.026259s
 ( 97.19%)||
 ||`apply_filter(...,2);`||2.034263s (100%)||2.041381s (100.35%)||1.996706s
 ( 98.15%)||
 ||`apply_filter(...,3);`||2.070492s (100%)||2.057335s ( 99.36%)||2.005684s
 ( 96.87%)||
 ||`apply_filter(...,4);`||2.031988s (100%)||2.054009s (101.08%)||2.020054s
 ( 99.41%)||
 ||`apply_filter(...,5);`||2.057378s (100%)||2.010370s ( 97.72%)||2.000880s
 ( 97.25%)||

 The first one seems kind of similar (with only noise making differences),
 but the total removal of one of the three branches is seems better. Again,
 in the 0 argument case indeed it is slightly slower, but that being only
 1% of the real world cases it is still better to only differentiate 2
 cases.

 I tried one more thing, because of unrelated reason that I had a negative
 experience a good few times when using a step debugger that these
 `call_user_func*` lines can sometimes be where I am setting breakpoints in
 order to identify what function/method is hooked into a given hook name,
 so I have to set 3 breakpoints or do a lots of manual stepping or write
 complex breakpoint criteria which is kind-of annoying.

 So I tried out of curiosity more solutions:

 Ternary conditional operator:

 {{{#!php
 <?php
 $value = call_user_func_array( $the_['function'], $the_['accepted_args']
 >= $num_args ? array_slice( $args, 0, $the_['accepted_args'] ) : $args );
 }}}

 This equivalent of the `Simplified branches` solution in behaviour.

 Always slice:

 {{{#!php
 <?php
 $value = call_user_func_array( $the_['function'], array_slice( $args, 0,
 $the_['accepted_args'] ) );
 }}}

 Just because it definitely looks like any kind of branching behaviour
 slows the loops down and also I think slicing an array to a length equal
 or above its element count should be kind of a no-op internally, so might
 be faster than an inline ternary.

 ||Apply Filter call with argument count||Current implementation||Ternary
 conditional operator||Always slice
 ||`apply_filter(...,0);`||1.984460s (100%)||2.026094s (102.10%)||1.958154s
 (98.67%)
 ||`apply_filter(...,1);`||2.084825s (100%)||2.040432s ( 97.87%)||1.954197s
 (93.73%)
 ||`apply_filter(...,2);`||2.034263s (100%)||1.998606s ( 98.25%)||1.964470s
 (96.57%)
 ||`apply_filter(...,3);`||2.070492s (100%)||2.001361s ( 96.66%)||1.974343s
 (95.36%)
 ||`apply_filter(...,4);`||2.031988s (100%)||2.005720s ( 98.71%)||1.970653s
 (96.98%)
 ||`apply_filter(...,5);`||2.057378s (100%)||2.097180s (101.93%)||1.981899s
 (96.33%)

 At the end of the day it looks like the PHP engine can just way better
 optimise if the same line is executed in all cases. And it has the added
 benefit of being much simpler code.

 I acknowledge that the data presented here suggest a rather hard to
 explain conclusion:
 Apparently using the more complex line of
 `$value = call_user_func_array( $the_['function'], array_slice( $args, 0,
 $the_['accepted_args'] ) );`
 is faster than using the simpler
 `$value = call_user_func_array( $the_['function'], $args );`
 which I find surprising. But I can imagine not needing to put large object
 onto the call stack or something like that, even if it needs slicing
 before is faster, but than still I cannot explain why the last line is
 faster still, where no slicing happens, because we pass on 5 out of 5
 arguments. Still it looks like the simpler solution only achieved 98.03%,
 while the more complex one 96.33%. It may be a fluke, although I ran each
 measurements multiple times, always ignoring the first one (to let Opcache
 initialize) and I tried to choose a representative outcome of many runs,
 and not outliers. But I can admit I more scientific methodology could have
 been used, but I only started to look into this out of curiosity and
 didn't plan to really do controlled science experiments.

 So long story short:
 - as @westonruter pointed out `array_slice` must be supported to avoid
 issues in some cases
 - the additional effort to use the faster `call_user_func()` instead of
 `call_user_func_array()` seem less relevant considering it only affects
 ~1% of the use-cases, but requires additional conditional check, and I
 assume creates a loop that is harder to optimise by the engine
 - apparently `array_slice()` is only a performance penalty, if the sliced
 array is smaller than the input array, so the additional branching again
 only adds overhead.
 - my numbers may include a fair amount of noise, but I did not cherry pick
 results and I think the performance gains are real, albeit the amounts may
 be different in different circumstances.

 Thinking about optimisation, these results may vary depending on CPU
 architecture as well (e.g. depending speculative execution solutions), but
 I think the simpler the loop is, the better it can be optimised.

 Simpler code, simpler debugging, better performance.

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/64180#comment:3>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform