[wp-trac] [WordPress Trac] #43258: Output buffer template rendering and add filter for post-processing (e.g. caching, optimization)
WordPress Trac
noreply at wordpress.org
Wed Mar 5 23:51:08 UTC 2025
#43258: Output buffer template rendering and add filter for post-processing (e.g.
caching, optimization)
-----------------------------+--------------------------------
Reporter: nextendweb | Owner: westonruter
Type: enhancement | Status: accepted
Priority: normal | Milestone: 6.9
Component: General | Version:
Severity: normal | Resolution:
Keywords: has-patch early | Focuses: docs, performance
-----------------------------+--------------------------------
Comment (by westonruter):
Replying to [comment:36 dmsnell]:
> Thanks everyone for pushing this issue forward. As most of you are
probably aware, Automattic has generally paused contributions to Core, so
I am unable at this time to interact more adequately on this issue. Still,
here are some basic thoughts from my end:
Thank you for taking the time!
> We want to be careful that we only provide semantic HTML filtering to
HTML outputs. That means excluding the filter from JSON outputs and RSS
outputs and XML-RPC/SOAP outputs and any other XML output. There may be
ways to more broadly filter HTML content on its way out of WordPress,
however, with respect to output buffering I don’t believe the primitives
are in place to make this smooth. Likely important is some global
`$content_type` variable indicating the output, as well as new filters in
the right places. I’ll come back to this. More broadly Core has what I
think is a problem with content provenance of various kinds that are
relevant to these designs.
I don't believe introducing a global `$content_type` is necessary because
we can look at the `Content-Type` header that WordPress has sent. For
[https://github.com/WordPress/performance/blob/a5c48c492defdcc0f47fc75139e86b9733994cb9/plugins
/optimization-detective/optimization.php#L191-L214 example]:
{{{#!php
<?php
function od_is_response_html_content_type(): bool {
$is_html_content_type = false;
$headers_list = array_merge(
array( 'Content-Type: ' . ini_get( 'default_mimetype' ) ),
headers_list()
);
foreach ( $headers_list as $header ) {
$header_parts = preg_split( '/\s*[:;]\s*/', strtolower(
$header ) );
if ( is_array( $header_parts ) && count( $header_parts )
>= 2 && 'content-type' === $header_parts[0] ) {
$is_html_content_type = in_array(
$header_parts[1], array( 'text/html', 'application/xhtml+xml' ), true );
}
}
return $is_html_content_type;
}
}}}
In an output buffer, this can be paired with
[https://github.com/WordPress/performance/blob/a5c48c492defdcc0f47fc75139e86b9733994cb9/plugins
/optimization-detective/optimization.php#L230-L236 checking] for the first
non-whitespace character being `<` :
{{{#!php
<?php
// If the content-type is not HTML or the output does not start with '<',
then abort since the buffer is definitely not HTML.
if (
! od_is_response_html_content_type() ||
! str_starts_with( ltrim( $buffer ), '<' )
) {
return $buffer;
}
}}}
> For HTML processing I think it’s likely more important to avoid exposing
the raw HTML. Some plugins will want this, that’s fine. But Core can
likely do a much better job designing and HTML-semantic output buffering
pipeline. That is, perhaps Core exposes things like “when reaching `IMG`
tags let me modify its attributes”. I think this is a reasonable place to
add a //class// as the filter so that we can rely on native methods for
dispatching the potential extension points — something akin to Python’s
`HtmlParser` class instead of exposing numerous specific filters that take
separate functions.
I'd love to see more of what you have in mind here. I know you've advised
against passing around instances of
`WP_HTML_Processor`/`WP_HTML_Tag_Processor` as callbacks for functions, so
I understand you're wanting a higher level abstraction that extensions
interface with. A couple of the use cases I have are for
[https://github.com/WordPress/performance/blob/a5c48c492defdcc0f47fc75139e86b9733994cb9/plugins
/image-prioritizer/class-image-prioritizer-img-tag-visitor.php#L207-L306
optimizing PICTURE tags] or
[https://github.com/WordPress/performance/blob/a5c48c492defdcc0f47fc75139e86b9733994cb9/plugins
/embed-optimizer/hooks.php#L170-L315 Embed blocks] both which require
walking over the children. I have a
[https://github.com/WordPress/performance/blob/trunk/plugins/optimization-
detective/docs/extensions.md#use-cases-and-examples list] of other such
optimizations built with the HTML Tag Processor.
> If you get curious, you can subclass the HTML API for more direct
control over the kinds of operations you are doing with regexes. The API
offers a hierarchy of opt-in risk based on your tolerance for parsing
issues and exploits and can do way more than it appears; because safety
and reliability were the highest design priorities.
Being able to subclass `WP_HTML_Processor` would seem to conflict with
using a single instance for processing the output buffer. Sure we could
introduce a filter like `wp_rest_server_class` for allowing plugins to
introduce their own subclass for the output buffer processing, but then if
multiple plugins want to each use their own subclass then they're out of
luck since only one can win.
--
Ticket URL: <https://core.trac.wordpress.org/ticket/43258#comment:37>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list