[wp-trac] [WordPress Trac] #62269: WP_HTML_Processor::next_token() cannot be extended in subclasses to keep track of state
WordPress Trac
noreply at wordpress.org
Mon Oct 21 23:28:51 UTC 2024
#62269: WP_HTML_Processor::next_token() cannot be extended in subclasses to keep
track of state
--------------------------+---------------------
Reporter: westonruter | Owner: (none)
Type: defect (bug) | Status: new
Priority: normal | Milestone: 6.7
Component: HTML API | Version: 6.5
Severity: normal | Resolution:
Keywords: | Focuses:
--------------------------+---------------------
Changes (by westonruter):
* keywords: has-patch has-unit-tests =>
Old description:
> In the Optimization Detective plugin from the WordPress Performance team
> there is a need to compute a precise locator for each tag in a document
> beyond just what `get_breadcrumbs()` provides. In particular, there is a
> need to disambiguate between two tags which may be siblings of each other
> in which case `array( 'html', 'body', 'img' )` will be ambiguous.
> Currently we're using XPaths for this purpose, for example if there are
> three `IMG` tags appearing as siblings at the beginning of the `BODY`,
> their XPaths are computed as:
>
> * `/*[1][self::HTML]/*[2][self::BODY]/*[1][self::IMG]`
> * `/*[1][self::HTML]/*[2][self::BODY]/*[2][self::IMG]`
> * `/*[1][self::HTML]/*[2][self::BODY]/*[3][self::IMG]`
>
> In order to compute these XPaths with HTML Tag Processor, the plugin
> extends the `WP_HTML_Tag_Processor` class with an wrapped version of
> `next_token()` so it can keep track of each new tag encountered to build
> up the array structure to compute the XPath.
>
> This turns out not to work when extending `WP_HTML_Processor` because
> `WP_HTML_Processor::next_token()` often does recursive calls, resulting
> in erroneous XPath indices being computed. For example, `next_token()` is
> called twice when processing `<html>` and three times when processing
> `<body>`, at least in my sample doc.
>
> The fix seems simple: move the logic from
> `WP_HTML_Processor::next_token()` into another private method like
> `WP_HTML_Processor::_next_token()` and update any recursive references to
> also call `WP_HTML_Processor::_next_token()`. Then
> `WP_HTML_Processor::next_token()` can simply just call
> `WP_HTML_Processor::_next_token()` and extending classes will be able to
> rely on each invocation of `next_token` corresponding to a new token.
> This would also be similar to what `WP_HTML_Tag_Processor::next_token()`
> does in that it is simply wrapping a call to
> `WP_HTML_Tag_Processor::base_class_next_token()`.
New description:
In the Optimization Detective plugin from the WordPress Core Performance
Team there is a need to compute a precise locator for each tag in a
document beyond just what `get_breadcrumbs()` provides. In particular,
there is a need to disambiguate between two tags which may be siblings of
each other in which case `array( 'html', 'body', 'img' )` will be
ambiguous. Currently we're using XPaths for this purpose, for example if
there are three `IMG` tags appearing as siblings at the beginning of the
`BODY`, their XPaths are computed as:
* `/*[1][self::HTML]/*[2][self::BODY]/*[1][self::IMG]`
* `/*[1][self::HTML]/*[2][self::BODY]/*[2][self::IMG]`
* `/*[1][self::HTML]/*[2][self::BODY]/*[3][self::IMG]`
In order to compute these XPaths with HTML Tag Processor, the plugin
extends the `WP_HTML_Tag_Processor` class with an wrapped version of
`next_token()` so it can keep track of each new tag encountered to build
up the array structure to compute the XPath.
This turns out not to work when extending `WP_HTML_Processor` because
`WP_HTML_Processor::next_token()` often does recursive calls, resulting in
erroneous XPath indices being computed. For example, `next_token()` is
called twice when processing `<html>` and three times when processing
`<body>`, at least in my sample doc.
The fix seems simple: move the logic from
`WP_HTML_Processor::next_token()` into another private method like
`WP_HTML_Processor::_next_token()` and update any recursive references to
also call `WP_HTML_Processor::_next_token()`. Then
`WP_HTML_Processor::next_token()` can simply just call
`WP_HTML_Processor::_next_token()` and extending classes will be able to
rely on each invocation of `next_token` corresponding to a new token. This
would also be similar to what `WP_HTML_Tag_Processor::next_token()` does
in that it is simply wrapping a call to
`WP_HTML_Tag_Processor::base_class_next_token()`.
--
--
Ticket URL: <https://core.trac.wordpress.org/ticket/62269#comment:2>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list