[wp-trac] [WordPress Trac] #62249: Investigate Theme JSON Data Caching Inconsistencies
WordPress Trac
noreply at wordpress.org
Thu Oct 17 21:41:14 UTC 2024
#62249: Investigate Theme JSON Data Caching Inconsistencies
--------------------------+-----------------------------
Reporter: mreishus | Owner: (none)
Type: defect (bug) | Status: new
Priority: normal | Milestone: Awaiting Review
Component: General | Version:
Severity: normal | Keywords:
Focuses: |
--------------------------+-----------------------------
== Summary ==
`get_merged_data()` gathers data from up to four sources and merges them
together. All four of those sources are filterable, but they are not
always responsive to the latest filters due to improper cache validation.
However, fixing it is not easy and may not be necessary - given that the
invalidation has been improper for two years, and no major issues have
been found.
== The Four Data Sources ==
* `get_core_data()`
* Calls the `wp_theme_json_data_default` filter.
* `get_block_data()`
* Calls the `wp_theme_json_data_blocks` filter.
* `get_theme_data()`
* Calls the `wp_theme_json_data_theme` filter.
* `get_user_data()`
* Calls the `wp_theme_json_data_user` filter.
== Lack of Invalidation ==
In these functions, there is no mechanism to detect a changing filter. The
filter is used when calculating the cached value, and the filter is not
used when detecting to invalidate. All 4 sources follow the same pattern.
Here is an outline:
{{{
#!php
public static function get_core_data() {
if ( null !== static::$core && static::has_same_registered_blocks(
'core' ) ) {
return static::$core;
}
...compute $config...
$theme_json = apply_filters( 'wp_theme_json_data_default', new
WP_Theme_JSON_Data( $config, 'default' ) );
static::$core = $theme_json->get_theme_json();
return static::$core
}
}}}
We invalidate when:
* the registered blocks change
* when someone explicitly calls `wp_clean_theme_json_cache()`
There is no invalidation when a filter is added or removed, or when a
filter changes behavior.
== A caching bug can make it appear like invalidation is happening. ==
To test this out, let's set up a simple function that we can use as a
filter:
{{{
#!php
public function add_custom_duotone_default( $theme_json ) {
$data = $theme_json->get_data();
$data['settings']['color']['duotone']['special1'] = array(
'name' => 'Cool Duotone',
'colors' => array( '#000000', '#7f7f7f' ),
'slug' => 'cool-duotone',
);
return new WP_Theme_JSON_Data( $data );
}
}}}
And let's do the simplest unit test possible, calling the function before,
adding the filter, calling the function again and checking that the data
the filter added is there:
{{{
#!php
public function test_core_data_filter_responsiveness_1() {
wp_clean_theme_json_cache();
$data1 = WP_Theme_JSON_Resolver::get_core_data();
add_filter( 'wp_theme_json_data_default', array( $this,
'add_custom_duotone_default' ) );
$final_data = WP_Theme_JSON_Resolver::get_core_data();
// This test passes, making it appear the function is
always responsive to the filter changing.
$this->assertEquals(
array(
'name' => 'Cool Duotone',
'colors' => array( '#000000', '#7f7f7f' ),
'slug' => 'cool-duotone',
),
$final_data->get_settings()['color']['duotone']['special1'],
'Default duotone should be added'
);
}
}}}
It passes! That certainly makes it look like the function invalidates its
cache when a filter is added.
However, making the seemingly inconsequental change of adding one extra
call to `::get_core_data()` causes the filter to not run:
{{{
#!php
public function test_core_data_filter_responsiveness_2() {
wp_clean_theme_json_cache();
$data1 = WP_Theme_JSON_Resolver::get_core_data();
$data2 = WP_Theme_JSON_Resolver::get_core_data(); //
<--- ADDED THIS
add_filter( 'wp_theme_json_data_default', array( $this,
'add_custom_duotone_default' ) );
$final_data = WP_Theme_JSON_Resolver::get_core_data();
// This test fails, revealing that the function is not
responsive to the filter changing,
// and the first test only passed because of a bug causing
the second call of the function
// to never be cached.
$this->assertEquals(
array(
'name' => 'Cool Duotone',
'colors' => array( '#000000', '#7f7f7f' ),
'slug' => 'cool-duotone',
),
$final_data->get_settings()['color']['duotone']['special1'],
'Default duotone should be added'
);
}
}}}
This test now fails. What's going on? It's identical to the previous one
except one extra `$data2 = WP_Theme_JSON_Resolver::get_core_data();` is
added before adding the filter.
The cause is: A bug in the caching logic causes the second call to the
function to never be cached.
In unchanging conditions:
* '''What you expect''': Call 1 to the function is computed, and calls 2
through N quickly return a cached value.
* '''What actually happens''': Calls 1 and 2 to the function are
computed, and calls 3 through N quickly return a cached value.
Because the most common unit test setup is to call the function once, make
a change, and then call the function again, the bug of "the second call is
never cached" can make it look like invalidation is happening when there
isn't.
In fact, there is a `test_get_core_data()` test that makes sure a newly
added filter is actually called. This is only passing because of the "the
second call is never cached" bug, and it can be broken by adding one extra
call during the setup phase.
Passing (currently on trunk):
{{{
#!php
public function test_get_core_data( $should_fire_filter,
$core_is_cached, $blocks_are_cached ) {
wp_clean_theme_json_cache();
// If should cache core, then fire the method to cache it
before running the tests.
if ( $core_is_cached ) {
WP_Theme_JSON_Resolver::get_core_data();
}
...rest of test omitted...
}
}}}
Will fail:
{{{
#!php
public function test_get_core_data( $should_fire_filter,
$core_is_cached, $blocks_are_cached ) {
wp_clean_theme_json_cache();
// If should cache core, then fire the method to cache it
before running the tests.
if ( $core_is_cached ) {
WP_Theme_JSON_Resolver::get_core_data();
WP_Theme_JSON_Resolver::get_core_data();
}
...rest of test omitted...
}
}}}
== Mechanics of the Caching Bug ==
The root cause of the caching bug lies in the logic at the beginning of
the `get_core_data()` function:
{{{
#!php
public static function get_core_data() {
if ( null !== static::$core && static::has_same_registered_blocks(
'core' ) ) {
return static::$core;
}
// ... rest of the function
}
}}}
This conditional statement is intended to return cached data if it exists
and if the registered blocks haven't changed. However, the implementation
has two issues:
1. **Short-circuiting of the logical AND operator (`&&`):**
- In PHP, the `&&` operator uses short-circuit evaluation.
- If `static::$core` is `null`, the second part of the condition
(`static::has_same_registered_blocks('core')`) is never evaluated.
2. **Side effect in `has_same_registered_blocks()`:**
- The `has_same_registered_blocks()` method updates the stored state of
registered blocks as a side effect.
- If this method isn't called due to short-circuiting, the stored state
isn't updated.
Here's how this plays out:
1. First call:
- `static::$core` is `null`, so `has_same_registered_blocks()` isn't
called.
- The function computes and caches the data.
2. Second call:
- `static::$core` is not `null`, so `has_same_registered_blocks()` is
called.
- This updates the stored state of registered blocks.
- The function recomputes and caches the data again.
3. Third and subsequent calls:
- Both conditions are true, so cached data is returned.
This explains why the second call always recomputes the data, creating the
illusion of proper invalidation in some test scenarios.
== Recap of Current Behavior and Its Implications ==
The `get_merged_data()` function, gathers data from four sources
(`get_core_data()`, `get_block_data()`, `get_theme_data()`, and
`get_user_data()`), each of which exhibits overeager caching behavior.
- The caching mechanism doesn't properly invalidate when filters change.
- The second call to these functions always recomputes data, sometimes
creating an illusion of proper invalidation.
- `get_merged_data()` is an intensive function, consuming up to 10% of a
request's time.
- The function scales poorly with the number of registered and rendered
blocks, leading to O(m * n) complexity, even when the data-gathering
functions return cached data.
{{{
#!php
public static function get_core_data() {
if ( null !== static::$core && static::has_same_registered_blocks(
'core' ) ) {
return static::$core;
}
// ... computation and caching logic
}
}}}
This pattern is repeated across all four data-gathering functions, leading
to potential inconsistencies in theme and block data.
== Why It May Not Be a Critical Issue ==
Despite the caching behavior not being responsive to filter changes, it
might not be a critical issue:
- This caching mechanism has been in place for two years without major
reported issues.
- Filters are often used for static changes or additions to settings,
usually applied during the setup process before rendering begins.
- There are few scenarios where theme settings would need to change
dynamically during rendering. Block attributes typically handle dynamic
behavior at the block level.
- The current caching strategy, while imperfect, fixed a performance
regression that may be reintroduced if the filtered functions are called
more often.
== Possible Solutions - The Documentation Approach ==
1. Acknowledge and document the current behavior:
- Clearly state that theme filters are not always recomputed.
- Explain when cache clearing occurs (e.g., at the beginning of the
main render process).
2. Set expectations for filter usage:
- Advise developers to register their filters before the main render
process begins.
- Discourage reliance on dynamic filter changes during rendering.
'''Pros:'''
- Minimal risk of introducing new bugs or performance regressions.
- Provides clarity for developers working with these functions.
'''Cons:'''
- Doesn't address the underlying technical limitations.
- May be seen as avoiding the problem rather than solving it.
== Possible Solutions - Filter Change Detection ==
Develop a subscription mechanism to detect changes in filter hooks:
{{{
#!php
class FilterChangeDetector {
private static $filter_state = [];
public static function record_filter_state( $hook_name ) {
self::$filter_state[$hook_name] = self::get_filter_signature(
$hook_name );
}
public static function has_filter_changed( $hook_name ) {
$old_signature = self::$filter_state[$hook_name] ?? null;
$new_signature = self::get_filter_signature( $hook_name );
return $old_signature !== $new_signature;
}
private static function get_filter_signature( $hook_name ) {
global $wp_filter;
$hook = $wp_filter[ $hook_name ] ?? null;
if ( empty( $hook ) ) {
return '';
}
$signature = '';
foreach ( $hook->callbacks as $priority => $callbacks ) {
$signature .= $priority . ':';
foreach ( $callbacks as $idx => $callback_data ) {
$signature .= $idx . ',';
}
$signature .= '|';
}
return md5( $signature );
}
// Alternate
private static function get_filter_signature_alt( $hook_name ) {
global $wp_filter;
return md5( serialize( $wp_filter[$hook_name] ?? null ) );
}
}
}}}
'''Pros:'''
- Provides more accurate invalidation
- Maintains most performance benefits
'''Cons:'''
- Adds complexity to the codebase
- May still miss some dynamic changes
== Possible Solutions - Post-Cache Filtering ==
Move filtering outside of the caching mechanism:
{{{
#!php
public static function get_core_data() {
if ( null !== static::$core && static::has_same_registered_blocks(
'core' ) ) {
$data = static::$core;
} else {
// ... computation logic
static::$core = $data;
}
return apply_filters( 'wp_theme_json_data_default', new
WP_Theme_JSON_Data( $data, 'default' ) );
}
}}}
'''Pros:'''
- Ensures filters are always applied
- Simplifies caching logic
'''Cons:'''
- Performance impact
- May worsen scaling issues with registered blocks
- Creating new WP_Theme_JSON_Data instances causes schema validation to
occur which scales with registered blocks.
== Conclusion ==
So I've uncovered an interesting situation with the theme JSON caching;
but it has been working for quite a while with no major issues.
One more thing: I think I know how to fix `has_same_registered_blocks()`
to kill the caching bug and make it slightly faster. But... it would break
some unit tests that are actually relying on the bug. So that's another
thing to consider.
What do you all think? I'd love to hear your thoughts on:
- Any real-world issues you've seen because of this
- Which solution you like best (or if you have a different idea)
- Whether we should tackle that `has_same_registered_blocks()` issue
Or any other ideas for the best way forward here.
== References ==
[54493] - From 2022, fixes performance regression, introducing caching
inconsistency
--
Ticket URL: <https://core.trac.wordpress.org/ticket/62249>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list