[wp-trac] [WordPress Trac] #62269: WP_HTML_Processor::next_token() cannot be extended in subclasses to keep track of state
WordPress Trac
noreply at wordpress.org
Mon Oct 21 23:21:59 UTC 2024
#62269: WP_HTML_Processor::next_token() cannot be extended in subclasses to keep
track of state
--------------------------+--------------------
Reporter: westonruter | Owner: (none)
Type: defect (bug) | Status: new
Priority: normal | Milestone: 6.7
Component: HTML API | Version: 6.5
Severity: normal | Keywords:
Focuses: |
--------------------------+--------------------
In the Optimization Detective plugin from the WordPress Performance team
there is a need to compute a precise locator for each tag in a document
beyond just what `get_breadcrumbs()` provides. In particular, there is a
need to disambiguate between two tags which may be siblings of each other
in which case `array( 'html', 'body', 'img' )` will be ambiguous.
Currently we're using XPaths for this purpose, for example if there are
three `IMG` tags appearing as siblings at the beginning of the `BODY`,
their XPaths are computed as:
* `/*[1][self::HTML]/*[2][self::BODY]/*[1][self::IMG]`
* `/*[1][self::HTML]/*[2][self::BODY]/*[2][self::IMG]`
* `/*[1][self::HTML]/*[2][self::BODY]/*[3][self::IMG]`
In order to compute these XPaths with HTML Tag Processor, the plugin
extends the `WP_HTML_Tag_Processor` class with an wrapped version of
`next_token()` so it can keep track of each new tag encountered to build
up the array structure to compute the XPath.
This turns out not to work when extending `WP_HTML_Processor` because
`WP_HTML_Processor::next_token()` often does recursive calls, resulting in
erroneous XPath indices being computed. For example, `next_token()` is
called twice when processing `<html>` and three times when processing
`<body>`, at least in my sample doc.
The fix seems simple: move the logic from
`WP_HTML_Processor::next_token()` into another private method like
`WP_HTML_Processor::_next_token()` and update any recursive references to
also call `WP_HTML_Processor::_next_token()`. Then
`WP_HTML_Processor::next_token()` can simply just call
`WP_HTML_Processor::_next_token()` and extending classes will be able to
rely on each invocation of `next_token` corresponding to a new token. This
would also be similar to what `WP_HTML_Tag_Processor::next_token()` does
in that it is simply wrapping a call to
`WP_HTML_Tag_Processor::base_class_next_token()`.
--
Ticket URL: <https://core.trac.wordpress.org/ticket/62269>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list