[wp-trac] [WordPress Trac] #64419: HTML API: Escape JavaScript, JSON script tag contents automatically

WordPress Trac noreply at wordpress.org
Mon Dec 15 18:18:05 UTC 2025


#64419: HTML API: Escape JavaScript, JSON script tag contents automatically
-------------------------+-----------------------------
 Reporter:  jonsurrell   |      Owner:  jonsurrell
     Type:  enhancement  |     Status:  assigned
 Priority:  normal       |  Milestone:  Awaiting Review
Component:  HTML API     |    Version:
 Severity:  normal       |   Keywords:
  Focuses:  javascript   |
-------------------------+-----------------------------
 The HTML API prevents setting `SCRIPT` tag that could modify the tree
 either by closing the `SCRIPT` element prematurely, or by preventing the
 `SCRIPT` element from closing at the expected close tag.

 This is handled by rejecting any script tag contents that are potentially
 dangerous and is safe. There are some improvements that could be made.

 If the contents are found to be unsafe and the type of the script tag is
 JSON or JavaScript [https://html.spec.whatwg.org/#the-script-element (this
 is well specified in the HTML standard)], it should be possible to apply a
 syntactic transformation to the contents in such a way that the script
 contents become safe ''without'' semantically altering the script.

 If the HTML API can safely and automatically escape the majority of
 `SCRIPT` tag contents, it can then be used to for `SCRIPT` tag creation
 and has the potential to ''eliminate'' the class of problem from #40737,
 #62797, and #63851. It also has the potential to address part of #51159
 where SCRIPT tag escaping becomes less of an issue.

 ----

 **JSON**

 In JSON `SCRIPT` tags, the transformation is a simple replacement of `<`
 with its Unicode escape sequence `\u003C`. This can be applied to the
 entire contents of the script or specifically in case-insensitive matches
 for `<script` and `</script`.

 ----

 **JavaScript**

 JavaScript `SCRIPT` tags are more difficult because the language has
 vastly more syntax. Fortunately, there is prior art described in this
 [https://sophiebits.com/2012/08/03/preventing-xss-json 2022 blog post
 (external)] from React team member Sophie Alpert.
 [https://github.com/facebook/react/blob/ae74234eae6ebd62f19190731278e20bc1c37d51/packages
 /react-dom-bindings/src/server/ReactFizzConfigDOM.js#L328-L350 It's the
 same the JavaScript `SCRIPT` tag contents escaping strategy that React
 continues to employ today.] In summary, the problematic text `<script` and
 `</script` syntactically appear in places where Unicode escape sequences
 can be used in the `script` part (Strings, Identifiers, and RegExp
 literals). React takes the approach of replacing the `s` character,
 resulting in `<\u0073cript` or `</\u0073cript`, completely safe in a
 Script tag.

 There are a few notable exceptions where the transformed JavaScript has
 observably different runtime behavior. These are the only examples I'm
 aware of. They're more esoteric parts of the language and the likelihood
 of them being used in inline JavaScript with the problematic text
 sequences seems an acceptable tradeoff to me to enable cheap, automatic
 JavaScript escaping.

 [https://developer.mozilla.org/en-
 US/docs/Web/JavaScript/Reference/Global_Objects/String/raw#building_an_identity_tag
 String.raw] does not process escape sequences.

 {{{#!js
 '<script>' === '<\u0073cript>'; // true
 String.raw`<script>` === String.raw`<\u0073cript>`; // false
 }}}


 [https://developer.mozilla.org/en-
 US/docs/Web/JavaScript/Reference/Template_literals#raw_strings Tagged
 templates can also access the raw strings], again a form without
 processing escape sequences.

 {{{#!js
 function taggedCooked( strings ) {
     return strings[0];
 }
 taggedCooked`<script>` === taggedCooked`<\u0073cript>`; // true

 function taggedRaw( strings ) {
     return strings.raw[0];
 }
 taggedRaw`<script>` === taggedRaw`<\u0073cript>`; // false
 }}}

 [https://developer.mozilla.org/en-
 US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/source The source
 property of RegExp] contains a string representation of the pattern.
 JavaScript RegExp support Unicode escape sequences, but the Unicode escape
 sequence is not transformed in the source.

 {{{#!js
 const rPlain = /<script>/;
 const rEscaped = /<\u0073cript>/

 rPlain.test('<script>'); // true
 rEscaped.test('<script>'); // true

 rPlain.source === rEscaped.source; // false
 rPlain.source; // '<script>'
 rEscaped.source; // '<\\u0073cript>'
 }}}

 Any better JavaScript escaping would likely require a complete JavaScript
 parser and much more invasive changes. It would be much more costly to
 perform. Even then, I'm not sure that the escaping could be done
 faithfully.

 `String.raw()` could be split and joined:

 {{{#!js
 String.raw`<script>` === String.raw`<s` + String.raw`cript>`; true ✅
 }}}

 Tagged template raw and RegExp source seem much more challenging.

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/64419>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list