[wp-trac] [WordPress Trac] #64054: HTML API: Attribute escaping should escape all HTML entities

WordPress Trac noreply at wordpress.org
Tue Sep 30 11:01:02 UTC 2025


#64054: HTML API: Attribute escaping should escape all HTML entities
--------------------------+-----------------------------
 Reporter:  jonsurrell    |      Owner:  (none)
     Type:  defect (bug)  |     Status:  new
 Priority:  normal        |  Milestone:  Awaiting Review
Component:  HTML API      |    Version:  6.2
 Severity:  normal        |   Keywords:
  Focuses:                |
--------------------------+-----------------------------
 Attribute values set with the HTML API method `set_attribute()`
 [https://core.trac.wordpress.org/browser/tags/6.8.2/src/wp-includes/html-
 api/class-wp-html-tag-processor.php#L3884 are escaped with] `esc_attr()`.
 That function avoids "double encoding" things that look like HTML
 character references.

 The HTML API should encode whatever it receives, and apply "double
 encoding." The HTML API expects to receive plain string inputs and manage
 any necessary encoding itself. The fact that "double encoding" is disabled
 violates this expectation and makes it difficult correctly to set
 attribute values that contain sequences that appear to be HTML character
 references.

 By contrast, `set_modifiable_text()` does not rely on `esc_html()`
 [https://core.trac.wordpress.org/browser/tags/6.8.2/src/wp-includes/html-
 api/class-wp-html-tag-processor.php#L3697 and uses] `htmlspecialchars()`
 directly. It will encode HTML character references as expected.

 The text `&` appears to be an encoded character reference:

 {{{#!php
 <?php
 $amp_text = '&';
 $p = WP_HTML_Processor::create_fragment( '<p>x</p>' );
 $p->next_tag();
 $p->set_attribute( 'data-attr', $amp_text );
 $p->next_token();
 $p->set_modifiable_text( $amp_text );
 echo $p->get_updated_html();
 }}}

 This prints the following HTML:

 {{{#!xml
 <p data-attr="&">&amp;</p>
 }}}


 Notice how the input text is treated differently between an attribute and
 text. The HTML encoding of the `&` character is the same in both contexts.
 The attribute has the ''value'' `&` instead of the expected `&`. The
 text node in the P element correctly renders `&` as expected.

 [https://playground.wordpress.net/php-
 playground.html#eyJjb2RlIjoiPHN0eWxlPlxucFtkYXRhLWF0dHJdIHtcbiAgd2hpdGUtc3BhY2U6IHByZTtcbiAgJjo6YWZ0ZXIgeyBjb250ZW50OiAnXFxBIEF0dHJpYnV0ZSB2YWx1ZTogXCInIGF0dHIoIGRhdGEtYXR0ciApICdcIic7IH1cbn1cbjwvc3R5bGU+XG48cD5UaGUgZm9sbG93aW5nIGlzIGV4cGVjdGVkIHRvIGRpc3BsYXkgdGhlIHRleHRcbjxiPjxjb2RlPiZhbXA7YW1wOzwvY29kZT48L2I+IGluIGJvdGggY2FzZXMuPC9wPlxuXG48cD5GaXJzdCwgdGhlIEhUTUwgcHJvY2Vzc29yOjwvcD5cblxuPD9waHBcbnJlcXVpcmUgJy93b3JkcHJlc3Mvd3AtbG9hZC5waHAnO1xuXG4kYW1wX3RleHQgPSAnJmFtcDsnO1xuJHAgPSBXUF9IVE1MX1Byb2Nlc3Nvcjo6Y3JlYXRlX2ZyYWdtZW50KCc8cD54PC9wPicpO1xuJHAtPm5leHRfdGFnKCk7XG4kcC0+c2V0X2F0dHJpYnV0ZSgnZGF0YS1hdHRyJywgJGFtcF90ZXh0KTtcbiRwLT5uZXh0X3Rva2VuKCk7XG4kcC0+c2V0X21vZGlmaWFibGVfdGV4dChcIkhUTUwgdGV4dDogXFxcInskYW1wX3RleHR9XFxcIlwiKTtcbmVjaG8gJHAtPmdldF91cGRhdGVkX2h0bWwoKTtcbj8+XG48YnI+XG5BbmQgYWdhaW4gd2l0aCB0aGUgdGFnIHByb2Nlc3Nvcjpcbjw/cGhwXG4kcCA9IG5ldyBXUF9IVE1MX1RhZ19Qcm9jZXNzb3IoJzxwPng8L3A+Jyk7XG4kcC0+bmV4dF90YWcoKTtcbiRwLT5zZXRfYXR0cmlidXRlKCdkYXRhLWF0dHInLCAkYW1wX3RleHQpO1xuJHAtP
 m5leHRfdG9rZW4oKTtcbiRwLT5zZXRfbW9kaWZpYWJsZV90ZXh0KFwiSFRNTCB0ZXh0OiBcXFwieyRhbXBfdGV4dH1cXFwiXCIpO1xuZWNobyAkcC0+Z2V0X3VwZGF0ZWRfaHRtbCgpO1xuIiwicGhwIjoiOC40Iiwid3AiOiI2LjgifQ==
 Here's a demo of the difference in behavior between setting attributes and
 modifiable text.]

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/64054>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list