[wp-trac] [WordPress Trac] #43258: Output buffer template rendering and add filter for post-processing (e.g. caching, optimization)
WordPress Trac
noreply at wordpress.org
Wed Mar 5 23:17:28 UTC 2025
#43258: Output buffer template rendering and add filter for post-processing (e.g.
caching, optimization)
-----------------------------+--------------------------------
Reporter: nextendweb | Owner: westonruter
Type: enhancement | Status: accepted
Priority: normal | Milestone: 6.9
Component: General | Version:
Severity: normal | Resolution:
Keywords: has-patch early | Focuses: docs, performance
-----------------------------+--------------------------------
Comment (by dmsnell):
Thanks everyone for pushing this issue forward. As most of you are
probably aware, Automattic has generally paused contributions to Core, so
I am unable at this time to interact more adequately on this issue. Still,
here are some basic thoughts from my end:
We want to be careful that we only provide semantic HTML filtering to HTML
outputs. That means excluding the filter from JSON outputs and RSS outputs
and XML-RPC/SOAP outputs and any other XML output. There may be ways to
more broadly filter HTML content on its way out of WordPress, however,
with respect to output buffering I don’t believe the primitives are in
place to make this smooth. Likely important is some global `$content_type`
variable indicating the output, as well as new filters in the right
places. I’ll come back to this. More broadly Core has what I think is a
problem with content provenance of various kinds that are relevant to
these designs.
The more I use the HTML API in practice the less concerned I am about
relying on the full-blown HTML Processor. This is because it occurs so
frequently that we need full HTML parsing that we might as well start with
that. In other words, if we end up with two output buffers: a fast Tag
Processor pass and a slow HTML Processor pass, then we might as well skip
the fast one because we’ll be doing the slow one anyway. If we wanted to,
this same process could normalize the HTML leaving the server to provide
well-formed documents, though there’s no real need to do this since
browsers do anyway. The point is that ten filters on one HTML Processor
filter pass is going to be faster than six filters on a Tag Processor and
four on an HTML Processor.
For HTML processing I think it’s likely more important to avoid exposing
the raw HTML. Some plugins will want this, that’s fine. But Core can
likely do a much better job designing and HTML-semantic output buffering
pipeline. That is, perhaps Core exposes things like “when reaching `IMG`
tags let me modify its attributes”. I think this is a reasonable place to
add a //class// as the filter so that we can rely on native methods for
dispatching the potential extension points — something akin to Python’s
`HtmlParser` class instead of exposing numerous specific filters that take
separate functions.
And this brings us back to the content type. If we expose the right
filters we won’t have to worry about content since we can run the semantic
filters on the full output buffer for HTML-output cases — no need to pass
in HTML as a string — but also we can run it on any HTML destined for
inclusion inside XML of JSON. My own work has demonstrated that it’s
possible for us to reliably convert HTML into XML for things like RSS/Atom
feeds //where XML is able to express the HTML//. This means that these
same filters could provide extensibility for non-HTML outputs through an
HTML interface. This is going to be a challenge if we go the semantic
route, because if we don’t address it then API responses will return
different content than page renders, for example.
> I would not want to (have to) switch my entire and "battle-hardened"
regex-based codebase to the HTML API to be very honest
Your plugin does a lot of HTML stitching and everyone’s invited to do
their own thing — stitching is still a developing part of the HTML API.
Reliability is not the concern with the HTML API though. Like your plugin,
Core is full of examples of “battle hardening,” but these usually cover
known patterns and fail in an array of common cases. I will not point out
any specific cases, but I saw the same characteristic regex issues in
autoptimize as I’ve seen basically everywhere. Regex‘s are easy, but the
HTML API will not mis-parse because it was designed around the spec
instead of input examples.
If you get curious, you can subclass the HTML API for more direct control
over the kinds of operations you are doing with regexes. The API offers a
hierarchy of opt-in risk based on your tolerance for parsing issues and
exploits and can do way more than it appears; because safety and
reliability were the highest design priorities.
--
Ticket URL: <https://core.trac.wordpress.org/ticket/43258#comment:36>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list