[wp-trac] [WordPress Trac] #61009: HTML API: Fix some existing bugs in `kses` comment detection, enable Bits storage. (was: HTML API: Preserve some additional invalid HTML comment syntaxes.)

WordPress Trac noreply at wordpress.org
Wed May 22 22:17:04 UTC 2024


#61009: HTML API: Fix some existing bugs in `kses` comment detection, enable Bits
storage.
-----------------------------------+------------------------------
 Reporter:  dmsnell                |       Owner:  (none)
     Type:  defect (bug)           |      Status:  new
 Priority:  normal                 |   Milestone:  Awaiting Review
Component:  HTML API               |     Version:  trunk
 Severity:  normal                 |  Resolution:
 Keywords:  has-patch 2nd-opinion  |     Focuses:
-----------------------------------+------------------------------
Description changed by dmsnell:

Old description:

> When `wp_kses_split` processes a document it attempts to leave HTML
> comments relatively alone. It makes minor adjustments, but leaves the
> comments in the document in its output.
>
> Unfortunately it only recognizes one kind of HTML comment and rejects
> many other kinds which appear as the result of various invalid HTML
> markup.
>
> This patch makes a minor adjustment to the algorithm in `wp_kses_split`
> to allow two additional kinds of HTML comments:
>
>  - HTML comments with the incorrect closer `--!>`.
>  - Closing tags with an invalid tag name, e.g. `</%dolly>`.
>
> In an HTML parser these all become comments, and so leaving them in the
> document should be a benign operation, improving the reliability of
> detecting comments in Core. These invalid closing tags, which in a
> browser are interpreted as comments, are one proposal for a placeholder
> mechanism in the HTML API unlocking HTML templating, a new kind of
> shortcode, and more. Having these persist in Core is a requirement for
> exploring and utilizing the new syntax.

New description:

 When `wp_kses_split` processes a document it attempts to leave HTML
 comments alone. It makes minor adjustments, but leaves the comments in the
 document in its output. Unfortunately it only recognizes one kind of HTML
 comment and rejects many others.

 In HTML there are many kinds of invalid markup which, according to the
 specification, are to be interpreted as an HTML comment. These include,
 but are not limited to:

  - HTML comments with invalid syntax, `<!-->`, `<!-- --!>`, etc…
  - HTML closing tags whose tag name is invalid `</3>`, `</%happy>`, etc…
  - Things that look like XML CDATA sections, `<![CDATA[…]]>`
  - Things that look like XML Processor Instruction nodes, `<?include
 "blarg">`

 This patch makes a minor adjustment to the algorithm in `wp_kses_split` to
 allow two additional kinds of HTML comments:

  - HTML comments with the incorrect closer `--!>`, because this one was a
 simple and easy change.
  - Closing tags with an invalid tag name, e.g. `</%dolly>`j, because these
 are required to open up explorations in Gutenberg on Bits, a new iteration
 of dynamic tokens for externally-sourced data, or "Shortcodes 2.0"

 These invalid closing tags, which in a browser are interpreted as
 comments, are one proposal for a placeholder mechanism in the HTML API
 unlocking HTML templating, a new kind of shortcode, and more. Having these
 persist in Core is a requirement for exploring and utilizing the new
 syntax because as long as Core removes them, there's no way to load
 content from the database and experiment on the full life cycle of
 potential Bits systems.

 On its own, however, this represents a kind of bug fix for Core, making
 the implementation of `wp_kses_split()` more closely align with its stated
 goal of leaving HTML comments as comments. It doesn't attempt to fully fix
 the mis-parsed comments (because that is a much deeper issue and involves
 many more questions about existing expectations) but it does propose a
 couple of hopefully and expectedly minor fixes that hopefully won't break
 any existing code or projects.

--

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/61009#comment:10>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list