[wp-trac] [WordPress Trac] #63020: HTML API: Breadcrumbs should include element indices and attributes
WordPress Trac
noreply at wordpress.org
Wed Feb 26 19:48:17 UTC 2025
#63020: HTML API: Breadcrumbs should include element indices and attributes
-------------------------+---------------------
Reporter: westonruter | Owner: (none)
Type: enhancement | Status: new
Priority: normal | Milestone: 6.9
Component: HTML API | Version:
Severity: normal | Resolution:
Keywords: needs-patch | Focuses:
-------------------------+---------------------
Description changed by westonruter:
Old description:
> The [https://wordpress.org/plugins/optimization-detective/ Optimization
> Detective] plugin from the Core Performance Team extends
> `WP_HTML_Tag_Processor` with some features from `WP_HTML_Processor` like
> `get_breadcrumbs()` and `get_current_depth()`. It also introduces its own
> method `get_xpath()` which computes an XPath expression to uniquely
> identify the element, for example:
>
> {{{
> /HTML/BODY/DIV[@class='wp-site-
> blocks']/*[1][self::HEADER]/*[1][self::DIV]/*[2][self::IMG]
> }}}
>
> See [https://github.com/WordPress/performance/blob/trunk/plugins
> /optimization-
> detective/docs/introduction.md#:~:text=The%20format%20of%20the%20XPath%20expression%20warrants%20further%20discussion.
> full documentation] for why the XPath is constructed like this. In short,
> `/HTML` and `/HTML/BODY` lack any node indices since there is no
> possibility for ambiguity. For children of the `BODY`, using node indices
> is not stable since arbitrary HTML may be printed at `wp_body_open()`,
> and for this reason it uses the `id`, `role`, or `class` attribute to add
> a disambiguating XPath predicate. For levels below this, elements are
> referenced as `*[1][self::IMG]` to target the an element that occurs at a
> specific position. If this were instead `/HEADER[1]` it would select the
> first `IMG` among other `IMG` elements, not the first `IMG` among all
> siblings. This ensures the XPath only matches an `IMG` when it is the
> first child, and it will no longer match if a `P` is inserted before it,
> for example.
>
> All this to say, `WP_HTML_Processor` does not keep track of element node
> indices, and it doesn't expose the attributes for the tags in the open
> stack (e.g. to get the `id`, `role`, or `class`). This would seem to make
> it more difficult to implement `get_xpath()` than maybe it should be.
> Ideally computing the XPath wouldn't require subclassing at all, and the
> information could be obtained from existing public methods. In
> Optimization Detective, the `WP_HTML_Tag_Processor` class is extended and
> the `next_token()` method is overridden so it can construct its own
> breadcrumbs and then also compute the node indices and capture the
> attributes at a given depth.
>
> All this to say, I suggest that in addition to `get_breadcrumbs()` that
> there be a way to get more information from the open stack of tags,
> including the attributes for each tag and the node index for each.
New description:
The [https://wordpress.org/plugins/optimization-detective/ Optimization
Detective] plugin from the Core Performance Team extends
`WP_HTML_Tag_Processor` with some features from `WP_HTML_Processor` like
`get_breadcrumbs()` and `get_current_depth()`. It also introduces its own
method `get_xpath()` which computes an XPath expression to uniquely
identify the element, for example:
{{{
/HTML/BODY/DIV[@class='wp-site-
blocks']/*[1][self::HEADER]/*[1][self::DIV]/*[2][self::IMG]
}}}
See [https://github.com/WordPress/performance/blob/trunk/plugins
/optimization-
detective/docs/introduction.md#:~:text=The%20format%20of%20the%20XPath%20expression%20warrants%20further%20discussion.
full documentation] for why the XPath is constructed like this. In short,
`/HTML` and `/HTML/BODY` lack any node indices since there is no
possibility for ambiguity. For children of the `BODY`, using node indices
is not stable since arbitrary HTML may be printed at `wp_body_open()`, and
for this reason it uses the `id`, `role`, or `class` attribute to add a
disambiguating XPath predicate. For levels below this, elements are
referenced as `*[1][self::IMG]` to target the an element that occurs at a
specific position. If this were instead `/HEADER[1]` it would select the
first `IMG` among other `IMG` elements, not the first `IMG` among all
siblings. This ensures the XPath only matches an `IMG` when it is the
first child, and it will no longer match if a `P` is inserted before it,
for example.
All this to say, `WP_HTML_Processor` does not keep track of element node
indices, and it doesn't expose the attributes for the tags in the open
stack (e.g. to get the `id`, `role`, or `class`). This would seem to make
it more difficult to implement `get_xpath()` than maybe it should be.
Ideally computing the XPath wouldn't require subclassing at all, and the
information could be obtained from existing public methods. In
Optimization Detective, the `WP_HTML_Tag_Processor` class is extended and
the `next_token()` method is overridden so it can construct its own
breadcrumbs and then also compute the node indices and capture the
attributes at a given depth.
All this to say, I suggest that in addition to `get_breadcrumbs()` that
there be a way to get more information from the open stack of tags,
including the attributes for each tag and the node index for each.
In other words, it's currently possible to construct an XPath like
`/HTML/BODY` as follows:
{{{#!php
<?php
$xpath = array_map(
function ( string $breadcrumb ): string {
return "/$breadcrumb";
},
$processor->get_breadcrumbs()
);
}}}
But I'm proposing something like `get_element_breadcrumbs()` which would
return objects for each open tag on the stack instead of just the tag
name. So then you could construct a full unambiguous XPath:
{{{#!php
<?php
$xpath = array_map(
function ( WP_Element $breadcrumb ): string {
$expression = '/*';
$expression .= sprintf( '[self::*]',
$breadcrumb->get_tag() );
foreach ( array( 'id', 'role', 'class' ) as
$attribute_name ) {
$attribute = $breadcrumb->get_attribute(
$attribute_name );
if ( is_string( $attribute ) ) {
$expression .= sprintf( '[@%s="%s"]',
$breadcrumb->get_tag(), addcslashes( $attribute, '\\"' ) );
break;
}
}
return $expression;
},
$processor->get_element_breadcrumbs()
);
}}}
--
--
Ticket URL: <https://core.trac.wordpress.org/ticket/63020#comment:3>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list