[wp-trac] [WordPress Trac] #63020: HTML API: Breadcrumbs should include element indices and attributes

Tue Feb 25 22:42:32 UTC 2025

#63020: HTML API: Breadcrumbs should include element indices and attributes
-------------------------+----------------------------
 Reporter:  westonruter  |      Owner:  (none)
     Type:  enhancement  |     Status:  new
 Priority:  normal       |  Milestone:  Future Release
Component:  HTML API     |    Version:
 Severity:  normal       |   Keywords:  needs-patch
  Focuses:               |
-------------------------+----------------------------
 The [https://wordpress.org/plugins/optimization-detective/ Optimization
 Detective] plugin from the Core Performance Team extends
 `WP_HTML_Tag_Processor` with some features from `WP_HTML_Processor` like
 `get_breadcrumbs()` and `get_current_depth()`. It also introduces its own
 method `get_xpath()` which computes an XPath expression to uniquely
 identify the element, for example:

 {{{
 /HTML/BODY/DIV[@class='wp-site-
 blocks']/*[1][self::HEADER]/*[1][self::DIV]/*[2][self::IMG]
 }}}

 See [https://github.com/WordPress/performance/blob/trunk/plugins
 /optimization-
 detective/docs/introduction.md#:~:text=The%20format%20of%20the%20XPath%20expression%20warrants%20further%20discussion.
 full documentation] for why the XPath is constructed like this. In short,
 `/HTML` and `/HTML/BODY` lack any node indices since there is no
 possibility for ambiguity. For children of the `BODY`, using node indices
 is not stable since arbitrary HTML may be printed at `wp_body_open()`, and
 for this reason it uses the `id`, `role`, or `class` attribute to add a
 disambiguating XPath predicate. For levels below this, elements are
 referenced as `*[1][self::IMG]` to target the an element that occurs at a
 specific position. If this were instead `/HEADER[1]` it would select the
 first `IMG` among other `IMG` elements, not the first `IMG` among all
 siblings. This ensures the XPath only matches an `IMG` when it is the
 first child, and it will no longer match if a `P` is inserted before it,
 for example.

 All this to say, `WP_HTML_Processor` does not keep track of element node
 indices, and it doesn't expose the attributes for the tags in the open
 stack (e.g. to get the `id`, `role`, or `class`). This would seem to make
 it more difficult to implement `get_xpath()` than maybe it should be.
 Ideally computing the XPath wouldn't require subclassing at all, and the
 information could be obtained from existing public methods. In
 Optimization Detective, the `WP_HTML_Tag_Processor` class is extended and
 the `next_token()` method is overridden so it can construct its own
 breadcrumbs and then also compute the node indices and capture the
 attributes at a given depth.

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/63020>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform