[wp-trac] [WordPress Trac] #43258: Output buffer template rendering and add filter for post-processing (e.g. caching, optimization)

Wed Mar 5 23:17:28 UTC 2025

#43258: Output buffer template rendering and add filter for post-processing (e.g.
caching, optimization)
-----------------------------+--------------------------------
 Reporter:  nextendweb       |       Owner:  westonruter
     Type:  enhancement      |      Status:  accepted
 Priority:  normal           |   Milestone:  6.9
Component:  General          |     Version:
 Severity:  normal           |  Resolution:
 Keywords:  has-patch early  |     Focuses:  docs, performance
-----------------------------+--------------------------------

Comment (by dmsnell):

 Thanks everyone for pushing this issue forward. As most of you are
 probably aware, Automattic has generally paused contributions to Core, so
 I am unable at this time to interact more adequately on this issue. Still,
 here are some basic thoughts from my end:

 We want to be careful that we only provide semantic HTML filtering to HTML
 outputs. That means excluding the filter from JSON outputs and RSS outputs
 and XML-RPC/SOAP outputs and any other XML output. There may be ways to
 more broadly filter HTML content on its way out of WordPress, however,
 with respect to output buffering I don’t believe the primitives are in
 place to make this smooth. Likely important is some global `$content_type`
 variable indicating the output, as well as new filters in the right
 places.  I’ll come back to this. More broadly Core has what I think is a
 problem with content provenance of various kinds that are relevant to
 these designs.

 The more I use the HTML API in practice the less concerned I am about
 relying on the full-blown HTML Processor. This is because it occurs so
 frequently that we need full HTML parsing that we might as well start with
 that. In other words, if we end up with two output buffers: a fast Tag
 Processor pass and a slow HTML Processor pass, then we might as well skip
 the fast one because we’ll be doing the slow one anyway. If we wanted to,
 this same process could normalize the HTML leaving the server to provide
 well-formed documents, though there’s no real need to do this since
 browsers do anyway. The point is that ten filters on one HTML Processor
 filter pass is going to be faster than six filters on a Tag Processor and
 four on an HTML Processor.

 For HTML processing I think it’s likely more important to avoid exposing
 the raw HTML. Some plugins will want this, that’s fine. But Core can
 likely do a much better job designing and HTML-semantic output buffering
 pipeline. That is, perhaps Core exposes things like “when reaching `IMG`
 tags let me modify its attributes”. I think this is a reasonable place to
 add a //class// as the filter so that we can rely on native methods for
 dispatching the potential extension points — something akin to Python’s
 `HtmlParser` class instead of exposing numerous specific filters that take
 separate functions.

 And this brings us back to the content type. If we expose the right
 filters we won’t have to worry about content since we can run the semantic
 filters on the full output buffer for HTML-output cases — no need to pass
 in HTML as a string — but also we can run it on any HTML destined for
 inclusion inside XML of JSON. My own work has demonstrated that it’s
 possible for us to reliably convert HTML into XML for things like RSS/Atom
 feeds //where XML is able to express the HTML//. This means that these
 same filters could provide extensibility for non-HTML outputs through an
 HTML interface. This is going to be a challenge if we go the semantic
 route, because if we don’t address it then API responses will return
 different content than page renders, for example.

 > I would not want to (have to) switch my entire and "battle-hardened"
 regex-based codebase to the HTML API to be very honest

 Your plugin does a lot of HTML stitching and everyone’s invited to do
 their own thing — stitching is still a developing part of the HTML API.
 Reliability is not the concern with the HTML API though. Like your plugin,
 Core is full of examples of “battle hardening,” but these usually cover
 known patterns and fail in an array of common cases. I will not point out
 any specific cases, but I saw the same characteristic regex issues in
 autoptimize  as I’ve seen basically everywhere. Regex‘s are easy, but the
 HTML API will not mis-parse because it was designed around the spec
 instead of input examples.

 If you get curious, you can subclass the HTML API for more direct control
 over the kinds of operations you are doing with regexes. The API offers a
 hierarchy of opt-in risk based on your tolerance for parsing issues and
 exploits and can do way more than it appears; because safety and
 reliability were the highest design priorities.

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/43258#comment:36>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform