[wp-trac] [WordPress Trac] #64180: Simplify and improve performance in `WP_Hook::apply_filters()`
WordPress Trac
noreply at wordpress.org
Sun Nov 2 00:24:50 UTC 2025
#64180: Simplify and improve performance in `WP_Hook::apply_filters()`
-------------------------+------------------------------
Reporter: grapestain | Owner: (none)
Type: enhancement | Status: new
Priority: normal | Milestone: Awaiting Review
Component: Plugins | Version: 6.8.3
Severity: minor | Resolution:
Keywords: | Focuses: performance
-------------------------+------------------------------
Comment (by grapestain):
Oh, wow, I would have never thought of that. Thanks for the insight.
In that case we can still achieve similar improvements just by removing
the `if ( 0 === $the_['accepted_args'] ) {` branch. This is because as
pointed out this branch is only triggered in about 1% of the total calls,
but is evaluated as the first condition in all cases. Apparently the
additional overhead added in 99% of the cases is not justified by the
small gain achieved in 1%.
So if we look at two more possible solutions:
Reorder branches:
{{{#!php
<?php
// Avoid the array_slice() if possible.
if ( $the_['accepted_args'] >= $num_args ) {
$value = call_user_func_array( $the_['function'], $args );
}
else if ( 0 === $the_['accepted_args'] ) {
$value = call_user_func( $the_['function'] );
}
else {
$value = call_user_func_array( $the_['function'], array_slice(
$args, 0, $the_['accepted_args'] ) );
}
}}}
Here we use the knowledge that 0 accepted arguments only happens around 1%
of the cases whereas accepting all arguments is roughly 80% of all filter
calls on the front end and around 75% on the admin, so checking for the
most likely branch first saves additional checks.
Simplified branches:
{{{#!php
<?php
if ( $the_['accepted_args'] >= $num_args ) {
$value = call_user_func_array( $the_['function'], $args );
} else {
$value = call_user_func_array( $the_['function'], array_slice(
$args, 0, $the_['accepted_args'] ) );
}
}}}
And here we go one step further assuming that the additional check for 0
arguments what is still going to fail in about 95% of the remaining cases
actually costs more than calling `array_slice` and slicing an array to an
empty array and using `call_user_func_array()` instead of
`call_user_func()` in ~5% of the remaining calls or ~1% of the total
calls.
So I did my tests again:
(Note that the 100% values are different, because my computer is in a
different stat since opening the ticket, so comparing to those would not
be fair.)
||Apply Filter call with argument count||Current implementation||Reordered
branches||Simplified branches||
||`apply_filter(...,0);`||1.984460s (100%)||2.017885s (101.68%)||1.993416s
(100.45%)||
||`apply_filter(...,1);`||2.084825s (100%)||2.038089s ( 97.76%)||2.026259s
( 97.19%)||
||`apply_filter(...,2);`||2.034263s (100%)||2.041381s (100.35%)||1.996706s
( 98.15%)||
||`apply_filter(...,3);`||2.070492s (100%)||2.057335s ( 99.36%)||2.005684s
( 96.87%)||
||`apply_filter(...,4);`||2.031988s (100%)||2.054009s (101.08%)||2.020054s
( 99.41%)||
||`apply_filter(...,5);`||2.057378s (100%)||2.010370s ( 97.72%)||2.000880s
( 97.25%)||
The first one seems kind of similar (with only noise making differences),
but the total removal of one of the three branches is seems better. Again,
in the 0 argument case indeed it is slightly slower, but that being only
1% of the real world cases it is still better to only differentiate 2
cases.
I tried one more thing, because of unrelated reason that I had a negative
experience a good few times when using a step debugger that these
`call_user_func*` lines can sometimes be where I am setting breakpoints in
order to identify what function/method is hooked into a given hook name,
so I have to set 3 breakpoints or do a lots of manual stepping or write
complex breakpoint criteria which is kind-of annoying.
So I tried out of curiosity more solutions:
Ternary conditional operator:
{{{#!php
<?php
$value = call_user_func_array( $the_['function'], $the_['accepted_args']
>= $num_args ? array_slice( $args, 0, $the_['accepted_args'] ) : $args );
}}}
This equivalent of the `Simplified branches` solution in behaviour.
Always slice:
{{{#!php
<?php
$value = call_user_func_array( $the_['function'], array_slice( $args, 0,
$the_['accepted_args'] ) );
}}}
Just because it definitely looks like any kind of branching behaviour
slows the loops down and also I think slicing an array to a length equal
or above its element count should be kind of a no-op internally, so might
be faster than an inline ternary.
||Apply Filter call with argument count||Current implementation||Ternary
conditional operator||Always slice
||`apply_filter(...,0);`||1.984460s (100%)||2.026094s (102.10%)||1.958154s
(98.67%)
||`apply_filter(...,1);`||2.084825s (100%)||2.040432s ( 97.87%)||1.954197s
(93.73%)
||`apply_filter(...,2);`||2.034263s (100%)||1.998606s ( 98.25%)||1.964470s
(96.57%)
||`apply_filter(...,3);`||2.070492s (100%)||2.001361s ( 96.66%)||1.974343s
(95.36%)
||`apply_filter(...,4);`||2.031988s (100%)||2.005720s ( 98.71%)||1.970653s
(96.98%)
||`apply_filter(...,5);`||2.057378s (100%)||2.097180s (101.93%)||1.981899s
(96.33%)
At the end of the day it looks like the PHP engine can just way better
optimise if the same line is executed in all cases. And it has the added
benefit of being much simpler code.
I acknowledge that the data presented here suggest a rather hard to
explain conclusion:
Apparently using the more complex line of
`$value = call_user_func_array( $the_['function'], array_slice( $args, 0,
$the_['accepted_args'] ) );`
is faster than using the simpler
`$value = call_user_func_array( $the_['function'], $args );`
which I find surprising. But I can imagine not needing to put large object
onto the call stack or something like that, even if it needs slicing
before is faster, but than still I cannot explain why the last line is
faster still, where no slicing happens, because we pass on 5 out of 5
arguments. Still it looks like the simpler solution only achieved 98.03%,
while the more complex one 96.33%. It may be a fluke, although I ran each
measurements multiple times, always ignoring the first one (to let Opcache
initialize) and I tried to choose a representative outcome of many runs,
and not outliers. But I can admit I more scientific methodology could have
been used, but I only started to look into this out of curiosity and
didn't plan to really do controlled science experiments.
So long story short:
- as @westonruter pointed out `array_slice` must be supported to avoid
issues in some cases
- the additional effort to use the faster `call_user_func()` instead of
`call_user_func_array()` seem less relevant considering it only affects
~1% of the use-cases, but requires additional conditional check, and I
assume creates a loop that is harder to optimise by the engine
- apparently `array_slice()` is only a performance penalty, if the sliced
array is smaller than the input array, so the additional branching again
only adds overhead.
- my numbers may include a fair amount of noise, but I did not cherry pick
results and I think the performance gains are real, albeit the amounts may
be different in different circumstances.
Thinking about optimisation, these results may vary depending on CPU
architecture as well (e.g. depending speculative execution solutions), but
I think the simpler the loop is, the better it can be optimised.
Simpler code, simpler debugging, better performance.
--
Ticket URL: <https://core.trac.wordpress.org/ticket/64180#comment:3>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list