[wp-trac] [WordPress Trac] #63020: HTML API: Breadcrumbs should include element indices and attributes
WordPress Trac
noreply at wordpress.org
Tue Feb 25 22:42:32 UTC 2025
#63020: HTML API: Breadcrumbs should include element indices and attributes
-------------------------+----------------------------
Reporter: westonruter | Owner: (none)
Type: enhancement | Status: new
Priority: normal | Milestone: Future Release
Component: HTML API | Version:
Severity: normal | Keywords: needs-patch
Focuses: |
-------------------------+----------------------------
The [https://wordpress.org/plugins/optimization-detective/ Optimization
Detective] plugin from the Core Performance Team extends
`WP_HTML_Tag_Processor` with some features from `WP_HTML_Processor` like
`get_breadcrumbs()` and `get_current_depth()`. It also introduces its own
method `get_xpath()` which computes an XPath expression to uniquely
identify the element, for example:
{{{
/HTML/BODY/DIV[@class='wp-site-
blocks']/*[1][self::HEADER]/*[1][self::DIV]/*[2][self::IMG]
}}}
See [https://github.com/WordPress/performance/blob/trunk/plugins
/optimization-
detective/docs/introduction.md#:~:text=The%20format%20of%20the%20XPath%20expression%20warrants%20further%20discussion.
full documentation] for why the XPath is constructed like this. In short,
`/HTML` and `/HTML/BODY` lack any node indices since there is no
possibility for ambiguity. For children of the `BODY`, using node indices
is not stable since arbitrary HTML may be printed at `wp_body_open()`, and
for this reason it uses the `id`, `role`, or `class` attribute to add a
disambiguating XPath predicate. For levels below this, elements are
referenced as `*[1][self::IMG]` to target the an element that occurs at a
specific position. If this were instead `/HEADER[1]` it would select the
first `IMG` among other `IMG` elements, not the first `IMG` among all
siblings. This ensures the XPath only matches an `IMG` when it is the
first child, and it will no longer match if a `P` is inserted before it,
for example.
All this to say, `WP_HTML_Processor` does not keep track of element node
indices, and it doesn't expose the attributes for the tags in the open
stack (e.g. to get the `id`, `role`, or `class`). This would seem to make
it more difficult to implement `get_xpath()` than maybe it should be.
Ideally computing the XPath wouldn't require subclassing at all, and the
information could be obtained from existing public methods. In
Optimization Detective, the `WP_HTML_Tag_Processor` class is extended and
the `next_token()` method is overridden so it can construct its own
breadcrumbs and then also compute the node indices and capture the
attributes at a given depth.
--
Ticket URL: <https://core.trac.wordpress.org/ticket/63020>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list