[wp-trac] [WordPress Trac] #62249: Investigate Theme JSON Data Caching Inconsistencies

WordPress Trac noreply at wordpress.org
Thu Oct 17 21:41:14 UTC 2024


#62249: Investigate Theme JSON Data Caching Inconsistencies
--------------------------+-----------------------------
 Reporter:  mreishus      |      Owner:  (none)
     Type:  defect (bug)  |     Status:  new
 Priority:  normal        |  Milestone:  Awaiting Review
Component:  General       |    Version:
 Severity:  normal        |   Keywords:
  Focuses:                |
--------------------------+-----------------------------
 == Summary ==

 `get_merged_data()` gathers data from up to four sources and merges them
 together. All four of those sources are filterable, but they are not
 always responsive to the latest filters due to improper cache validation.

 However, fixing it is not easy and may not be necessary - given that the
 invalidation has been improper for two years, and no major issues have
 been found.

 == The Four Data Sources ==

 * `get_core_data()`
   * Calls the `wp_theme_json_data_default` filter.
 * `get_block_data()`
   * Calls the `wp_theme_json_data_blocks` filter.
 * `get_theme_data()`
   * Calls the `wp_theme_json_data_theme` filter.
 * `get_user_data()`
   * Calls the `wp_theme_json_data_user` filter.

 == Lack of Invalidation ==

 In these functions, there is no mechanism to detect a changing filter. The
 filter is used when calculating the cached value, and the filter is not
 used when detecting to invalidate. All 4 sources follow the same pattern.
 Here is an outline:

 {{{
 #!php
 public static function get_core_data() {
     if ( null !== static::$core && static::has_same_registered_blocks(
 'core' ) ) {
         return static::$core;
     }

     ...compute $config...
     $theme_json = apply_filters( 'wp_theme_json_data_default', new
 WP_Theme_JSON_Data( $config, 'default' ) );

     static::$core = $theme_json->get_theme_json();
     return static::$core
 }
 }}}

 We invalidate when:
  * the registered blocks change
  * when someone explicitly calls `wp_clean_theme_json_cache()`

 There is no invalidation when a filter is added or removed, or when a
 filter changes behavior.

 == A caching bug can make it appear like invalidation is happening. ==

 To test this out, let's set up a simple function that we can use as a
 filter:

 {{{
 #!php
         public function add_custom_duotone_default( $theme_json ) {
                 $data = $theme_json->get_data();
                 $data['settings']['color']['duotone']['special1'] = array(
                         'name'   => 'Cool Duotone',
                         'colors' => array( '#000000', '#7f7f7f' ),
                         'slug'   => 'cool-duotone',
                 );
                 return new WP_Theme_JSON_Data( $data );
         }
 }}}

 And let's do the simplest unit test possible, calling the function before,
 adding the filter, calling the function again and checking that the data
 the filter added is there:

 {{{
 #!php
         public function test_core_data_filter_responsiveness_1() {
                 wp_clean_theme_json_cache();

                 $data1 = WP_Theme_JSON_Resolver::get_core_data();
                 add_filter( 'wp_theme_json_data_default', array( $this,
 'add_custom_duotone_default' ) );
                 $final_data = WP_Theme_JSON_Resolver::get_core_data();

                 // This test passes, making it appear the function is
 always responsive to the filter changing.
                 $this->assertEquals(
                         array(
                                 'name'   => 'Cool Duotone',
                                 'colors' => array( '#000000', '#7f7f7f' ),
                                 'slug'   => 'cool-duotone',
                         ),
 $final_data->get_settings()['color']['duotone']['special1'],
                         'Default duotone should be added'
                 );
         }
 }}}

 It passes! That certainly makes it look like the function invalidates its
 cache when a filter is added.

 However, making the seemingly inconsequental change of adding one extra
 call to `::get_core_data()` causes the filter to not run:

 {{{
 #!php
         public function test_core_data_filter_responsiveness_2() {
                 wp_clean_theme_json_cache();

                 $data1 = WP_Theme_JSON_Resolver::get_core_data();
                 $data2 = WP_Theme_JSON_Resolver::get_core_data();   //
 <--- ADDED THIS
                 add_filter( 'wp_theme_json_data_default', array( $this,
 'add_custom_duotone_default' ) );
                 $final_data = WP_Theme_JSON_Resolver::get_core_data();

                 // This test fails, revealing that the function is not
 responsive to the filter changing,
                 // and the first test only passed because of a bug causing
 the second call of the function
                 // to never be cached.
                 $this->assertEquals(
                         array(
                                 'name'   => 'Cool Duotone',
                                 'colors' => array( '#000000', '#7f7f7f' ),
                                 'slug'   => 'cool-duotone',
                         ),
 $final_data->get_settings()['color']['duotone']['special1'],
                         'Default duotone should be added'
                 );
         }
 }}}

 This test now fails. What's going on? It's identical to the previous one
 except one extra `$data2 = WP_Theme_JSON_Resolver::get_core_data();` is
 added before adding the filter.

 The cause is: A bug in the caching logic causes the second call to the
 function to never be cached.

 In unchanging conditions:
  * '''What you expect''': Call 1 to the function is computed, and calls 2
 through N quickly return a cached value.
  * '''What actually happens''': Calls 1 and 2 to the function are
 computed, and calls 3 through N quickly return a cached value.

 Because the most common unit test setup is to call the function once, make
 a change, and then call the function again, the bug of "the second call is
 never cached" can make it look like invalidation is happening when there
 isn't.

 In fact, there is a `test_get_core_data()` test that makes sure a newly
 added filter is actually called. This is only passing because of the "the
 second call is never cached" bug, and it can be broken by adding one extra
 call during the setup phase.

 Passing (currently on trunk):

 {{{
 #!php
         public function test_get_core_data( $should_fire_filter,
 $core_is_cached, $blocks_are_cached ) {
                 wp_clean_theme_json_cache();

                 // If should cache core, then fire the method to cache it
 before running the tests.
                 if ( $core_is_cached ) {
                         WP_Theme_JSON_Resolver::get_core_data();
                 }
                 ...rest of test omitted...
     }
 }}}

 Will fail:

 {{{
 #!php
         public function test_get_core_data( $should_fire_filter,
 $core_is_cached, $blocks_are_cached ) {
                 wp_clean_theme_json_cache();

                 // If should cache core, then fire the method to cache it
 before running the tests.
                 if ( $core_is_cached ) {
                         WP_Theme_JSON_Resolver::get_core_data();
                         WP_Theme_JSON_Resolver::get_core_data();
                 }
                 ...rest of test omitted...
     }
 }}}

 == Mechanics of the Caching Bug ==

 The root cause of the caching bug lies in the logic at the beginning of
 the `get_core_data()` function:

 {{{
 #!php
 public static function get_core_data() {
     if ( null !== static::$core && static::has_same_registered_blocks(
 'core' ) ) {
         return static::$core;
     }
     // ... rest of the function
 }
 }}}

 This conditional statement is intended to return cached data if it exists
 and if the registered blocks haven't changed. However, the implementation
 has two issues:

 1. **Short-circuiting of the logical AND operator (`&&`):**
    - In PHP, the `&&` operator uses short-circuit evaluation.
    - If `static::$core` is `null`, the second part of the condition
 (`static::has_same_registered_blocks('core')`) is never evaluated.

 2. **Side effect in `has_same_registered_blocks()`:**
    - The `has_same_registered_blocks()` method updates the stored state of
 registered blocks as a side effect.
    - If this method isn't called due to short-circuiting, the stored state
 isn't updated.

 Here's how this plays out:

 1. First call:
    - `static::$core` is `null`, so `has_same_registered_blocks()` isn't
 called.
    - The function computes and caches the data.

 2. Second call:
    - `static::$core` is not `null`, so `has_same_registered_blocks()` is
 called.
    - This updates the stored state of registered blocks.
    - The function recomputes and caches the data again.

 3. Third and subsequent calls:
    - Both conditions are true, so cached data is returned.

 This explains why the second call always recomputes the data, creating the
 illusion of proper invalidation in some test scenarios.

 == Recap of Current Behavior and Its Implications ==

 The `get_merged_data()` function, gathers data from four sources
 (`get_core_data()`, `get_block_data()`, `get_theme_data()`, and
 `get_user_data()`), each of which exhibits overeager caching behavior.

 - The caching mechanism doesn't properly invalidate when filters change.
 - The second call to these functions always recomputes data, sometimes
 creating an illusion of proper invalidation.
 - `get_merged_data()` is an intensive function, consuming up to 10% of a
 request's time.
 - The function scales poorly with the number of registered and rendered
 blocks, leading to O(m * n) complexity, even when the data-gathering
 functions return cached data.

 {{{
 #!php
 public static function get_core_data() {
     if ( null !== static::$core && static::has_same_registered_blocks(
 'core' ) ) {
         return static::$core;
     }
     // ... computation and caching logic
 }
 }}}

 This pattern is repeated across all four data-gathering functions, leading
 to potential inconsistencies in theme and block data.

 == Why It May Not Be a Critical Issue ==

 Despite the caching behavior not being responsive to filter changes, it
 might not be a critical issue:

 - This caching mechanism has been in place for two years without major
 reported issues.
 - Filters are often used for static changes or additions to settings,
 usually applied during the setup process before rendering begins.
 - There are few scenarios where theme settings would need to change
 dynamically during rendering. Block attributes typically handle dynamic
 behavior at the block level.
 - The current caching strategy, while imperfect, fixed a performance
 regression that may be reintroduced if the filtered functions are called
 more often.

 == Possible Solutions - The Documentation Approach ==

 1. Acknowledge and document the current behavior:
    - Clearly state that theme filters are not always recomputed.
    - Explain when cache clearing occurs (e.g., at the beginning of the
 main render process).

 2. Set expectations for filter usage:
    - Advise developers to register their filters before the main render
 process begins.
    - Discourage reliance on dynamic filter changes during rendering.

 '''Pros:'''
 - Minimal risk of introducing new bugs or performance regressions.
 - Provides clarity for developers working with these functions.

 '''Cons:'''
 - Doesn't address the underlying technical limitations.
 - May be seen as avoiding the problem rather than solving it.

 == Possible Solutions - Filter Change Detection ==

 Develop a subscription mechanism to detect changes in filter hooks:

 {{{
 #!php
 class FilterChangeDetector {
     private static $filter_state = [];

     public static function record_filter_state( $hook_name ) {
         self::$filter_state[$hook_name] = self::get_filter_signature(
 $hook_name );
     }

     public static function has_filter_changed( $hook_name ) {
         $old_signature = self::$filter_state[$hook_name] ?? null;
         $new_signature = self::get_filter_signature( $hook_name );
         return $old_signature !== $new_signature;
     }

     private static function get_filter_signature( $hook_name ) {
         global $wp_filter;
         $hook = $wp_filter[ $hook_name ] ?? null;
         if ( empty( $hook ) ) {
             return '';
         }
         $signature = '';

         foreach ( $hook->callbacks as $priority => $callbacks ) {
             $signature .= $priority . ':';
             foreach ( $callbacks as $idx => $callback_data ) {
                 $signature .= $idx . ',';
             }
             $signature .= '|';
         }

         return md5( $signature );
     }

     // Alternate
     private static function get_filter_signature_alt( $hook_name ) {
         global $wp_filter;
         return md5( serialize( $wp_filter[$hook_name] ?? null ) );
     }
 }
 }}}

 '''Pros:'''
 - Provides more accurate invalidation
 - Maintains most performance benefits

 '''Cons:'''
 - Adds complexity to the codebase
 - May still miss some dynamic changes

 == Possible Solutions - Post-Cache Filtering ==

 Move filtering outside of the caching mechanism:

 {{{
 #!php
 public static function get_core_data() {
     if ( null !== static::$core && static::has_same_registered_blocks(
 'core' ) ) {
         $data = static::$core;
     } else {
         // ... computation logic
         static::$core = $data;
     }
     return apply_filters( 'wp_theme_json_data_default', new
 WP_Theme_JSON_Data( $data, 'default' ) );
 }
 }}}

 '''Pros:'''
 - Ensures filters are always applied
 - Simplifies caching logic

 '''Cons:'''
 - Performance impact
 - May worsen scaling issues with registered blocks
   - Creating new WP_Theme_JSON_Data instances causes schema validation to
 occur which scales with registered blocks.

 == Conclusion ==

 So I've uncovered an interesting situation with the theme JSON caching;
 but it has been working for quite a while with no major issues.

 One more thing: I think I know how to fix `has_same_registered_blocks()`
 to kill the caching bug and make it slightly faster. But... it would break
 some unit tests that are actually relying on the bug. So that's another
 thing to consider.

 What do you all think? I'd love to hear your thoughts on:
   - Any real-world issues you've seen because of this
   - Which solution you like best (or if you have a different idea)
   - Whether we should tackle that `has_same_registered_blocks()` issue

 Or any other ideas for the best way forward here.


 == References ==

 [54493] - From 2022, fixes performance regression, introducing caching
 inconsistency

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/62249>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list