[wp-trac] [WordPress Trac] #62270: Unable to set bookmark on </body> in WP_HTML_Processor
WordPress Trac
noreply at wordpress.org
Wed Oct 23 12:32:17 UTC 2024
#62270: Unable to set bookmark on </body> in WP_HTML_Processor
--------------------------------------+------------------------------
Reporter: westonruter | Owner: (none)
Type: defect (bug) | Status: new
Priority: normal | Milestone: Awaiting Review
Component: HTML API | Version: 6.4
Severity: normal | Resolution:
Keywords: has-patch has-unit-tests | Focuses:
--------------------------------------+------------------------------
Comment (by jonsurrell):
Ah, this is fascinating.
[https://html.spec.whatwg.org/multipage/parsing.html#parsing-main-
inbody:~:text=An%20end%20tag%20whose%20tag%20name%20is%20%22body%22 The
BODY tag closer doesn't remove BODY from the stack of open elements] like
most tag closers. It simply changes the insertion mode:
> **An end tag whose tag name is "body"**
> If the stack of open elements does not have a body element in scope,
this is a parse error; ignore the token.
> …
> Switch the insertion mode to "after body".
Subsequent insertion modes may switch back to "in body" insertion mode.
[https://software.hixie.ch/utilities/js/live-dom-
viewer/?%3Cbody%3E%3C%2Fbody%3E%0Aback%20in%20body.%0A%3C%2Fhtml%3E%0A%3Ci%3Estill%3C%2Fi%3E%20in%20body!%0A%3C%2Fbody%3E%0A%3C!--%20Body%20is%20%22closed%22%2C%20but%20only%20moves%20back%20into%20it.%20--%3E
Here's an interesting example:]
{{{
<body></body>
back in body.
</html>
<i>still</i> in body!
</body>
<!-- Body is "closed" because nothing re-enters. -->
}}}
{{{
HTML
├── HEAD
├── BODY
│ ├── #text: back in body.
│ ├── I
│ │ └── #text: still
│ └── #text: in body!
└── #comment: Body is "closed" because nothing re-enters.
}}}
The HTML processor attempts to hide these surprising details and present
an idealized representation closer to the DOM tree. In that regard, it's
doing the right thing. It only pops `BODY` and `HTML` off the stack when
it reaches the end of the HTML.
(Right now the HTML API errors when adding content outside of HTML or
BODY, [https://github.com/WordPress/wordpress-develop/pull/7312 see PR
7312] for work to improve that.)
(**Aside:** Upon review, I think the handling of BODY and HTML elements
that don't impact the "tree" are buggy and are one reason
[https://github.com/WordPress/wordpress-
develop/blob/63b94d9b46fd7594b6c6bea8d2aea2a07ea2ee3e/src/wp-includes
/html-api/class-wp-html-processor.php#L627-L639 that this bock is
needed.])
-----
I think it's correct that the closing `BODY` tag does not have a bookmark
because it does not correspond to the removal of the `BODY` element. It
_may_, but there are no guarantees and this would require peeking ahead in
the HTML, something that the HTML API does not do at this time.
I don't have a solution to offer at this time, I need to think about it. I
don't think the answer it to bookmark that location.
It's interesting to consider that when the processor actually takes the
`BODY` tag off the stack (and pauses on the `BODY` closer) it's at the end
of the document:
{{{
<html><body></body></html>
<!-- When we see </BODY> and </HTML> the parser is actually here! -->
}}}
If we were to inject HTML at this location, the result would be the
following:
{{{
<html><body></body></html>
<div>injected html</div>
}}}
In most cases (except whitespace text or comments) the injected HTML would
actually be inside the BODY tag in the tree, but the HTML is undesirable.
--
Ticket URL: <https://core.trac.wordpress.org/ticket/62270#comment:4>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list