[wp-trac] [WordPress Trac] #62270: Unable to set bookmark on </body> in WP_HTML_Processor

WordPress Trac noreply at wordpress.org
Wed Oct 23 12:32:17 UTC 2024


#62270: Unable to set bookmark on </body> in WP_HTML_Processor
--------------------------------------+------------------------------
 Reporter:  westonruter               |       Owner:  (none)
     Type:  defect (bug)              |      Status:  new
 Priority:  normal                    |   Milestone:  Awaiting Review
Component:  HTML API                  |     Version:  6.4
 Severity:  normal                    |  Resolution:
 Keywords:  has-patch has-unit-tests  |     Focuses:
--------------------------------------+------------------------------

Comment (by jonsurrell):

 Ah, this is fascinating.

 [https://html.spec.whatwg.org/multipage/parsing.html#parsing-main-
 inbody:~:text=An%20end%20tag%20whose%20tag%20name%20is%20%22body%22 The
 BODY tag closer doesn't remove BODY from the stack of open elements] like
 most tag closers. It simply changes the insertion mode:

 > **An end tag whose tag name is "body"**
 > If the stack of open elements does not have a body element in scope,
 this is a parse error; ignore the token.
 > …
 > Switch the insertion mode to "after body".

 Subsequent insertion modes may switch back to "in body" insertion mode.
 [https://software.hixie.ch/utilities/js/live-dom-
 viewer/?%3Cbody%3E%3C%2Fbody%3E%0Aback%20in%20body.%0A%3C%2Fhtml%3E%0A%3Ci%3Estill%3C%2Fi%3E%20in%20body!%0A%3C%2Fbody%3E%0A%3C!--%20Body%20is%20%22closed%22%2C%20but%20only%20moves%20back%20into%20it.%20--%3E
 Here's an interesting example:]

 {{{
 <body></body>
 back in body.
 </html>
 <i>still</i> in body!
 </body>
 <!-- Body is "closed" because nothing re-enters. -->
 }}}

 {{{
 HTML
 ├── HEAD
 ├── BODY
 │   ├── #text: back in body.
 │   ├── I
 │   │   └── #text: still
 │   └── #text: in body!
 └── #comment: Body is "closed" because nothing re-enters.
 }}}

 The HTML processor attempts to hide these surprising details and present
 an idealized representation closer to the DOM tree. In that regard, it's
 doing the right thing. It only pops `BODY` and `HTML` off the stack when
 it reaches the end of the HTML.

 (Right now the HTML API errors when adding content outside of HTML or
 BODY, [https://github.com/WordPress/wordpress-develop/pull/7312 see PR
 7312] for work to improve that.)

 (**Aside:** Upon review, I think the handling of BODY and HTML elements
 that don't impact the "tree" are buggy and are one reason
 [https://github.com/WordPress/wordpress-
 develop/blob/63b94d9b46fd7594b6c6bea8d2aea2a07ea2ee3e/src/wp-includes
 /html-api/class-wp-html-processor.php#L627-L639 that this bock is
 needed.])

 -----

 I think it's correct that the closing `BODY` tag does not have a bookmark
 because it does not correspond to the removal of the `BODY` element. It
 _may_, but there are no guarantees and this would require peeking ahead in
 the HTML, something that the HTML API does not do at this time.

 I don't have a solution to offer at this time, I need to think about it. I
 don't think the answer it to bookmark that location.

 It's interesting to consider that when the processor actually takes the
 `BODY` tag off the stack (and pauses on the `BODY` closer) it's at the end
 of the document:

 {{{
 <html><body></body></html>
 <!-- When we see </BODY> and </HTML> the parser is actually here! -->
 }}}

 If we were to inject HTML at this location, the result would be the
 following:

 {{{
 <html><body></body></html>
 <div>injected html</div>
 }}}

 In most cases (except whitespace text or comments) the injected HTML would
 actually be inside the BODY tag in the tree, but the HTML is undesirable.

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/62270#comment:4>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list