[wp-trac] [WordPress Trac] #64038: Cache miss for `WP_Term_Query`

WordPress Trac noreply at wordpress.org
Mon Mar 30 06:48:14 UTC 2026


#64038: Cache miss for `WP_Term_Query`
--------------------------------------+-----------------------------
 Reporter:  Chouby                    |       Owner:  (none)
     Type:  defect (bug)              |      Status:  new
 Priority:  normal                    |   Milestone:  Future Release
Component:  Taxonomy                  |     Version:
 Severity:  normal                    |  Resolution:
 Keywords:  has-patch has-unit-tests  |     Focuses:  performance
--------------------------------------+-----------------------------

Comment (by sanket.parmar):

 == Root cause

 `generate_cache_key()` currently hashes the full set of sanitised query
 args together with the SQL:

 {{{
 $key = md5( serialize( $cache_args ) . $sql );
 }}}

 `wp_dropdown_categories()` (called from the quick-filter on the Posts list
 table) passes `hierarchical => 1` (int `1`), while `wp_terms_checklist()`
 passes `get => 'all'` which `parse_query()` immediately normalises to
 `hierarchical => false` (bool `false`). Both values drive identical SQL —
 the query planner sees no difference — yet `serialize()` produces two
 different strings, so two different cache keys, so two identical DB
 queries.

 == Approaches considered

 === Option A — SQL-only key

 Hashes only the SQL:

 {{{
 $key = md5( $sql );
 }}}

 This would resolve the duplicate-query issue cleanly. The problem is that
 `WP_Term_Query` applies several PHP-level post-processing steps //after//
 the query that change the data stored in and read back from the cache:

  * `_get_term_children()` for `child_of` — runs entirely in PHP; when
 `number = 0` there is no LIMIT in SQL to differentiate the results.
  * `_pad_term_counts()` for `pad_counts` — writes `{term_id, count}`
 objects as cache values rather than plain term ID arrays.
  * The `$hierarchical && $args['hide_empty']` pruning loop — rewrites the
 PHP result set without touching SQL.
  * When `hierarchical = true`, SQL skips the LIMIT clause entirely, so PHP
 slices the result with `array_slice()` using the `number`/`offset` args.

 If two callers share the same SQL-only key but one has `pad_counts =
 true`, the first writer stores padded `{term_id, count}` objects and the
 second caller misreads them as plain IDs (or vice versa). Pure SQL-only
 keying is therefore not safe here without a deeper refactor.

 === Option B — Normalize boolean-like args before serializing
 (conservative)

 Keep the existing `serialize($cache_args) . $sql` structure but normalise
 all truthy/falsy args to their canonical PHP types (`(bool)`, `(int)`) and
 apply `wp_recursive_ksort()` before hashing, similar to
 `wp_dropdown_query_hash()` in `general-template.php`.

 '''Downside:''' it addresses only the type-coercion symptom. Future
 callers that express the same intent through semantically equivalent but
 structurally different args would still miss the cache. It is also a wider
 change — every single query arg gets serialised even when most of them
 have no bearing on post-processing.

 === Option C — SQL + only the args that affect PHP post-processing
 (chosen)

 This is the targeted fix: base the key on the SQL (which already encodes
 everything that drives the DB query) plus a small, explicit set of args
 that control PHP-level result shaping:

 ||= Arg =||= Why it must be in the key =||
 || `child_of` || `_get_term_children()` filters in PHP; when `number=0`
 this is not reflected in SQL ||
 || `pad_counts` || `_pad_term_counts()` runs in PHP and changes stored
 cache shape ||
 || `prune_empty_terms` || Combined `(bool)($hierarchical && $hide_empty)`
 — only the conjunction matters, not the individual values ||
 || `number` / `offset` (when hierarchical) || No LIMIT in SQL for
 hierarchical queries; PHP slices with `array_slice()` ||
 || `fields` || Normalised to `'all'` for non-count/non-object_id queries
 (existing logic preserved) ||

 {{{
 $php_cache_args = array(
     'child_of'          => (int)  $args['child_of'],
     'pad_counts'        => (bool) $args['pad_counts'],
     'prune_empty_terms' => (bool) ( $args['hierarchical'] &&
 $args['hide_empty'] ),
 );

 if ( $args['hierarchical'] && $args['number'] ) {
     $php_cache_args['number'] = (int) $args['number'];
     $php_cache_args['offset'] = (int) $args['offset'];
 }

 if ( 'count' !== $args['fields'] && 'all_with_object_id' !==
 $args['fields'] ) {
     $php_cache_args['fields'] = 'all';
 } else {
     $php_cache_args['fields'] = $args['fields'];
 }

 $key = md5( $sql . serialize( $php_cache_args ) );
 }}}

 For this ticket: both calls resolve `prune_empty_terms = false` (since
 `hide_empty = 0`), `child_of = 0`, `pad_counts = false`, and produce
 identical SQL → same cache key → single DB query.

 This approach is proportionally safe because the only args retained in the
 key are those proven to alter the cached result set.

 == Tests included

 Four new test methods in `Tests_Term_Query` (`@ticket 64038`, `@group
 cache`):

  1. '''`test_equivalent_queries_share_cache_entry`''' — asserts that the
 `wp_dropdown_categories`-style and `wp_terms_checklist`-style calls
 produce no second DB query (the regression test for this ticket).
  2.
 '''`test_queries_with_different_prune_empty_terms_get_separate_cache_entries`'''
 — asserts that `hierarchical=true && hide_empty=true` vs.
 `hide_empty=false` get distinct keys.
  3.
 '''`test_queries_with_different_child_of_get_separate_cache_entries`''' —
 asserts different `child_of` values get distinct keys.
  4.
 '''`test_queries_with_different_pad_counts_get_separate_cache_entries`'''
 — asserts different `pad_counts` values get distinct keys.

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/64038#comment:6>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list