[wp-trac] [WordPress Trac] #64038: Cache miss for `WP_Term_Query`
WordPress Trac
noreply at wordpress.org
Mon Mar 30 06:48:14 UTC 2026
#64038: Cache miss for `WP_Term_Query`
--------------------------------------+-----------------------------
Reporter: Chouby | Owner: (none)
Type: defect (bug) | Status: new
Priority: normal | Milestone: Future Release
Component: Taxonomy | Version:
Severity: normal | Resolution:
Keywords: has-patch has-unit-tests | Focuses: performance
--------------------------------------+-----------------------------
Comment (by sanket.parmar):
== Root cause
`generate_cache_key()` currently hashes the full set of sanitised query
args together with the SQL:
{{{
$key = md5( serialize( $cache_args ) . $sql );
}}}
`wp_dropdown_categories()` (called from the quick-filter on the Posts list
table) passes `hierarchical => 1` (int `1`), while `wp_terms_checklist()`
passes `get => 'all'` which `parse_query()` immediately normalises to
`hierarchical => false` (bool `false`). Both values drive identical SQL —
the query planner sees no difference — yet `serialize()` produces two
different strings, so two different cache keys, so two identical DB
queries.
== Approaches considered
=== Option A — SQL-only key
Hashes only the SQL:
{{{
$key = md5( $sql );
}}}
This would resolve the duplicate-query issue cleanly. The problem is that
`WP_Term_Query` applies several PHP-level post-processing steps //after//
the query that change the data stored in and read back from the cache:
* `_get_term_children()` for `child_of` — runs entirely in PHP; when
`number = 0` there is no LIMIT in SQL to differentiate the results.
* `_pad_term_counts()` for `pad_counts` — writes `{term_id, count}`
objects as cache values rather than plain term ID arrays.
* The `$hierarchical && $args['hide_empty']` pruning loop — rewrites the
PHP result set without touching SQL.
* When `hierarchical = true`, SQL skips the LIMIT clause entirely, so PHP
slices the result with `array_slice()` using the `number`/`offset` args.
If two callers share the same SQL-only key but one has `pad_counts =
true`, the first writer stores padded `{term_id, count}` objects and the
second caller misreads them as plain IDs (or vice versa). Pure SQL-only
keying is therefore not safe here without a deeper refactor.
=== Option B — Normalize boolean-like args before serializing
(conservative)
Keep the existing `serialize($cache_args) . $sql` structure but normalise
all truthy/falsy args to their canonical PHP types (`(bool)`, `(int)`) and
apply `wp_recursive_ksort()` before hashing, similar to
`wp_dropdown_query_hash()` in `general-template.php`.
'''Downside:''' it addresses only the type-coercion symptom. Future
callers that express the same intent through semantically equivalent but
structurally different args would still miss the cache. It is also a wider
change — every single query arg gets serialised even when most of them
have no bearing on post-processing.
=== Option C — SQL + only the args that affect PHP post-processing
(chosen)
This is the targeted fix: base the key on the SQL (which already encodes
everything that drives the DB query) plus a small, explicit set of args
that control PHP-level result shaping:
||= Arg =||= Why it must be in the key =||
|| `child_of` || `_get_term_children()` filters in PHP; when `number=0`
this is not reflected in SQL ||
|| `pad_counts` || `_pad_term_counts()` runs in PHP and changes stored
cache shape ||
|| `prune_empty_terms` || Combined `(bool)($hierarchical && $hide_empty)`
— only the conjunction matters, not the individual values ||
|| `number` / `offset` (when hierarchical) || No LIMIT in SQL for
hierarchical queries; PHP slices with `array_slice()` ||
|| `fields` || Normalised to `'all'` for non-count/non-object_id queries
(existing logic preserved) ||
{{{
$php_cache_args = array(
'child_of' => (int) $args['child_of'],
'pad_counts' => (bool) $args['pad_counts'],
'prune_empty_terms' => (bool) ( $args['hierarchical'] &&
$args['hide_empty'] ),
);
if ( $args['hierarchical'] && $args['number'] ) {
$php_cache_args['number'] = (int) $args['number'];
$php_cache_args['offset'] = (int) $args['offset'];
}
if ( 'count' !== $args['fields'] && 'all_with_object_id' !==
$args['fields'] ) {
$php_cache_args['fields'] = 'all';
} else {
$php_cache_args['fields'] = $args['fields'];
}
$key = md5( $sql . serialize( $php_cache_args ) );
}}}
For this ticket: both calls resolve `prune_empty_terms = false` (since
`hide_empty = 0`), `child_of = 0`, `pad_counts = false`, and produce
identical SQL → same cache key → single DB query.
This approach is proportionally safe because the only args retained in the
key are those proven to alter the cached result set.
== Tests included
Four new test methods in `Tests_Term_Query` (`@ticket 64038`, `@group
cache`):
1. '''`test_equivalent_queries_share_cache_entry`''' — asserts that the
`wp_dropdown_categories`-style and `wp_terms_checklist`-style calls
produce no second DB query (the regression test for this ticket).
2.
'''`test_queries_with_different_prune_empty_terms_get_separate_cache_entries`'''
— asserts that `hierarchical=true && hide_empty=true` vs.
`hide_empty=false` get distinct keys.
3.
'''`test_queries_with_different_child_of_get_separate_cache_entries`''' —
asserts different `child_of` values get distinct keys.
4.
'''`test_queries_with_different_pad_counts_get_separate_cache_entries`'''
— asserts different `pad_counts` values get distinct keys.
--
Ticket URL: <https://core.trac.wordpress.org/ticket/64038#comment:6>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list