[wp-trac] [WordPress Trac] #62270: Unable to set bookmark on </body> in WP_HTML_Processor
WordPress Trac
noreply at wordpress.org
Wed Oct 23 19:25:36 UTC 2024
#62270: Unable to set bookmark on </body> in WP_HTML_Processor
----------------------------+------------------------------
Reporter: westonruter | Owner: (none)
Type: defect (bug) | Status: new
Priority: normal | Milestone: Awaiting Review
Component: HTML API | Version: 6.4
Severity: normal | Resolution:
Keywords: has-unit-tests | Focuses:
----------------------------+------------------------------
Comment (by jonsurrell):
> If there is a comment preceding the SCRIPT tag then what? The comment
would be left after </body> but then the SCRIPT tag and any other
remaining content would be moved up to the end of the </body>?
That's exactly right, except that if we're assuming the original HTML
ended with `</body></html>` then the comment would be outside the HTML
element.
-----
To elaborate with a few examples:
Let's assume the original HTML ends as expected: `</body></html>`. This
would put the parser into the wonderfully named
[https://html.spec.whatwg.org/multipage/parsing.html#the-after-after-body-
insertion-mode "after after body" insertion mode].
- A comment token is inserted as the last child of the document element.
- Whitespace text is inserted as a child of `BODY` (but does NOT change
the insertion mode).
- Anything else (that isn't ignored) switches the insertion mode to "in
body" and is reprocessed.
We can look at a few cases.
If the HTML (with appended HTML) looks like this:
{{{
</body></html>
<!-- A comment -->
Text
}}}
[https://software.hixie.ch/utilities/js/live-dom-
viewer/?%3C%2Fbody%3E%3C%2Fhtml%3E%0A%09%3C!--%20A%20comment%20--%3E%0AText
We get this:]
{{{
HTML
├── HEAD
├── BODY
│ ├── (… whatever was originally here …)
│ └── #text: \n\t\nText\n
└── #comment: A comment
}}}
The BODY element ends with `\n\t\nText\n`. The `Document` (outside of
`HTML`) ends with the comment. Most instances of getting it wrong look
something like this where comments would not be children of body ''if''
nothing has triggered the switch back to "in body" insertion mode.
However, if the HTML looks something like this:
{{{
</body></html>
<div>We want to append this to <code>BODY</code></div>
<!-- A comment -->
Text
}}}
[https://software.hixie.ch/utilities/js/live-dom-
viewer/?%3C%2Fbody%3E%3C%2Fhtml%3E%0A%3Cdiv%3EWe%20want%20to%20append%20this%20to%20%3Ccode%3EBODY%3C%2Fcode%3E%3C%2Fdiv%3E%0A%09%3C!--%20A%20comment%20--%3E%0AText
This is the result:]
{{{
HTML
├── HEAD
└── BODY
├── (… whatever was originally here …)
├── #text: \n
├── DIV
│ ├─ #text: We want to append this to
│ └─ CODE
│ └─ #text: BODY
├── #text: \n\t
├── #comment: A comment
└── #text: Text\n
}}}
Then the DOM is exactly how we wanted it. Everything is under BODY an in
the same order. This is because before the comment can appear out of
place, the DIV caused the insertion mode to switch to "in body." As long
as there's any non-whitespace text or another element before any comments,
this should always hold. And I believe comments are the only thing that
can have experience this problem.
--
Ticket URL: <https://core.trac.wordpress.org/ticket/62270#comment:8>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list