[wp-trac] [WordPress Trac] #43258: Output buffer template rendering and add filter for post-processing (e.g. caching, optimization)

Wed Mar 5 23:51:08 UTC 2025

#43258: Output buffer template rendering and add filter for post-processing (e.g.
caching, optimization)
-----------------------------+--------------------------------
 Reporter:  nextendweb       |       Owner:  westonruter
     Type:  enhancement      |      Status:  accepted
 Priority:  normal           |   Milestone:  6.9
Component:  General          |     Version:
 Severity:  normal           |  Resolution:
 Keywords:  has-patch early  |     Focuses:  docs, performance
-----------------------------+--------------------------------

Comment (by westonruter):

 Replying to [comment:36 dmsnell]:
 > Thanks everyone for pushing this issue forward. As most of you are
 probably aware, Automattic has generally paused contributions to Core, so
 I am unable at this time to interact more adequately on this issue. Still,
 here are some basic thoughts from my end:

 Thank you for taking the time!

 > We want to be careful that we only provide semantic HTML filtering to
 HTML outputs. That means excluding the filter from JSON outputs and RSS
 outputs and XML-RPC/SOAP outputs and any other XML output. There may be
 ways to more broadly filter HTML content on its way out of WordPress,
 however, with respect to output buffering I don’t believe the primitives
 are in place to make this smooth. Likely important is some global
 `$content_type` variable indicating the output, as well as new filters in
 the right places.  I’ll come back to this. More broadly Core has what I
 think is a problem with content provenance of various kinds that are
 relevant to these designs.

 I don't believe introducing a global `$content_type` is necessary because
 we can look at the `Content-Type` header that WordPress has sent. For
 [https://github.com/WordPress/performance/blob/a5c48c492defdcc0f47fc75139e86b9733994cb9/plugins
 /optimization-detective/optimization.php#L191-L214 example]:

 {{{#!php
 <?php
 function od_is_response_html_content_type(): bool {
         $is_html_content_type = false;

         $headers_list = array_merge(
                 array( 'Content-Type: ' . ini_get( 'default_mimetype' ) ),
                 headers_list()
         );
         foreach ( $headers_list as $header ) {
                 $header_parts = preg_split( '/\s*[:;]\s*/', strtolower(
 $header ) );
                 if ( is_array( $header_parts ) && count( $header_parts )
 >= 2 && 'content-type' === $header_parts[0] ) {
                         $is_html_content_type = in_array(
 $header_parts[1], array( 'text/html', 'application/xhtml+xml' ), true );
                 }
         }

         return $is_html_content_type;
 }
 }}}

 In an output buffer, this can be paired with
 [https://github.com/WordPress/performance/blob/a5c48c492defdcc0f47fc75139e86b9733994cb9/plugins
 /optimization-detective/optimization.php#L230-L236 checking] for the first
 non-whitespace character being `<` :

 {{{#!php
 <?php
 // If the content-type is not HTML or the output does not start with '<',
 then abort since the buffer is definitely not HTML.
 if (
         ! od_is_response_html_content_type() ||
         ! str_starts_with( ltrim( $buffer ), '<' )
 ) {
         return $buffer;
 }
 }}}

 > For HTML processing I think it’s likely more important to avoid exposing
 the raw HTML. Some plugins will want this, that’s fine. But Core can
 likely do a much better job designing and HTML-semantic output buffering
 pipeline. That is, perhaps Core exposes things like “when reaching `IMG`
 tags let me modify its attributes”. I think this is a reasonable place to
 add a //class// as the filter so that we can rely on native methods for
 dispatching the potential extension points — something akin to Python’s
 `HtmlParser` class instead of exposing numerous specific filters that take
 separate functions.

 I'd love to see more of what you have in mind here. I know you've advised
 against passing around instances of
 `WP_HTML_Processor`/`WP_HTML_Tag_Processor` as callbacks for functions, so
 I understand you're wanting a higher level abstraction that extensions
 interface with. A couple of the use cases I have are for
 [https://github.com/WordPress/performance/blob/a5c48c492defdcc0f47fc75139e86b9733994cb9/plugins
 /image-prioritizer/class-image-prioritizer-img-tag-visitor.php#L207-L306
 optimizing PICTURE tags] or
 [https://github.com/WordPress/performance/blob/a5c48c492defdcc0f47fc75139e86b9733994cb9/plugins
 /embed-optimizer/hooks.php#L170-L315 Embed blocks] both which require
 walking over the children. I have a
 [https://github.com/WordPress/performance/blob/trunk/plugins/optimization-
 detective/docs/extensions.md#use-cases-and-examples list] of other such
 optimizations built with the HTML Tag Processor.

 > If you get curious, you can subclass the HTML API for more direct
 control over the kinds of operations you are doing with regexes. The API
 offers a hierarchy of opt-in risk based on your tolerance for parsing
 issues and exploits and can do way more than it appears; because safety
 and reliability were the highest design priorities.

 Being able to subclass `WP_HTML_Processor` would seem to conflict with
 using a single instance for processing the output buffer. Sure we could
 introduce a filter like `wp_rest_server_class` for allowing plugins to
 introduce their own subclass for the output buffer processing, but then if
 multiple plugins want to each use their own subclass then they're out of
 luck since only one can win.

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/43258#comment:37>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform