[wp-trac] [WordPress Trac] #63020: HTML API: Breadcrumbs should include element indices and attributes
WordPress Trac
noreply at wordpress.org
Wed Feb 26 19:39:34 UTC 2025
#63020: HTML API: Breadcrumbs should include element indices and attributes
-------------------------+---------------------
Reporter: westonruter | Owner: (none)
Type: enhancement | Status: new
Priority: normal | Milestone: 6.9
Component: HTML API | Version:
Severity: normal | Resolution:
Keywords: needs-patch | Focuses:
-------------------------+---------------------
Description changed by westonruter:
Old description:
> The [https://wordpress.org/plugins/optimization-detective/ Optimization
> Detective] plugin from the Core Performance Team extends
> `WP_HTML_Tag_Processor` with some features from `WP_HTML_Processor` like
> `get_breadcrumbs()` and `get_current_depth()`. It also introduces its own
> method `get_xpath()` which computes an XPath expression to uniquely
> identify the element, for example:
>
> {{{
> /HTML/BODY/DIV[@class='wp-site-
> blocks']/*[1][self::HEADER]/*[1][self::DIV]/*[2][self::IMG]
> }}}
>
> See [https://github.com/WordPress/performance/blob/trunk/plugins
> /optimization-
> detective/docs/introduction.md#:~:text=The%20format%20of%20the%20XPath%20expression%20warrants%20further%20discussion.
> full documentation] for why the XPath is constructed like this. In short,
> `/HTML` and `/HTML/BODY` lack any node indices since there is no
> possibility for ambiguity. For children of the `BODY`, using node indices
> is not stable since arbitrary HTML may be printed at `wp_body_open()`,
> and for this reason it uses the `id`, `role`, or `class` attribute to add
> a disambiguating XPath predicate. For levels below this, elements are
> referenced as `*[1][self::IMG]` to target the an element that occurs at a
> specific position. If this were instead `/HEADER[1]` it would select the
> first `IMG` among other `IMG` elements, not the first `IMG` among all
> siblings. This ensures the XPath only matches an `IMG` when it is the
> first child, and it will no longer match if a `P` is inserted before it,
> for example.
>
> All this to say, `WP_HTML_Processor` does not keep track of element node
> indices, and it doesn't expose the attributes for the tags in the open
> stack (e.g. to get the `id`, `role`, or `class`). This would seem to make
> it more difficult to implement `get_xpath()` than maybe it should be.
> Ideally computing the XPath wouldn't require subclassing at all, and the
> information could be obtained from existing public methods. In
> Optimization Detective, the `WP_HTML_Tag_Processor` class is extended and
> the `next_token()` method is overridden so it can construct its own
> breadcrumbs and then also compute the node indices and capture the
> attributes at a given depth.
New description:
The [https://wordpress.org/plugins/optimization-detective/ Optimization
Detective] plugin from the Core Performance Team extends
`WP_HTML_Tag_Processor` with some features from `WP_HTML_Processor` like
`get_breadcrumbs()` and `get_current_depth()`. It also introduces its own
method `get_xpath()` which computes an XPath expression to uniquely
identify the element, for example:
{{{
/HTML/BODY/DIV[@class='wp-site-
blocks']/*[1][self::HEADER]/*[1][self::DIV]/*[2][self::IMG]
}}}
See [https://github.com/WordPress/performance/blob/trunk/plugins
/optimization-
detective/docs/introduction.md#:~:text=The%20format%20of%20the%20XPath%20expression%20warrants%20further%20discussion.
full documentation] for why the XPath is constructed like this. In short,
`/HTML` and `/HTML/BODY` lack any node indices since there is no
possibility for ambiguity. For children of the `BODY`, using node indices
is not stable since arbitrary HTML may be printed at `wp_body_open()`, and
for this reason it uses the `id`, `role`, or `class` attribute to add a
disambiguating XPath predicate. For levels below this, elements are
referenced as `*[1][self::IMG]` to target the an element that occurs at a
specific position. If this were instead `/HEADER[1]` it would select the
first `IMG` among other `IMG` elements, not the first `IMG` among all
siblings. This ensures the XPath only matches an `IMG` when it is the
first child, and it will no longer match if a `P` is inserted before it,
for example.
All this to say, `WP_HTML_Processor` does not keep track of element node
indices, and it doesn't expose the attributes for the tags in the open
stack (e.g. to get the `id`, `role`, or `class`). This would seem to make
it more difficult to implement `get_xpath()` than maybe it should be.
Ideally computing the XPath wouldn't require subclassing at all, and the
information could be obtained from existing public methods. In
Optimization Detective, the `WP_HTML_Tag_Processor` class is extended and
the `next_token()` method is overridden so it can construct its own
breadcrumbs and then also compute the node indices and capture the
attributes at a given depth.
All this to say, I suggest that in addition to `get_breadcrumbs()` that
there be a way to get more information from the open stack of tags,
including the attributes for each tag and the node index for each.
--
--
Ticket URL: <https://core.trac.wordpress.org/ticket/63020#comment:2>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list