[wp-trac] [WordPress Trac] #64419: HTML API: Escape JavaScript, JSON script tag contents automatically
WordPress Trac
noreply at wordpress.org
Mon Dec 15 18:18:05 UTC 2025
#64419: HTML API: Escape JavaScript, JSON script tag contents automatically
-------------------------+-----------------------------
Reporter: jonsurrell | Owner: jonsurrell
Type: enhancement | Status: assigned
Priority: normal | Milestone: Awaiting Review
Component: HTML API | Version:
Severity: normal | Keywords:
Focuses: javascript |
-------------------------+-----------------------------
The HTML API prevents setting `SCRIPT` tag that could modify the tree
either by closing the `SCRIPT` element prematurely, or by preventing the
`SCRIPT` element from closing at the expected close tag.
This is handled by rejecting any script tag contents that are potentially
dangerous and is safe. There are some improvements that could be made.
If the contents are found to be unsafe and the type of the script tag is
JSON or JavaScript [https://html.spec.whatwg.org/#the-script-element (this
is well specified in the HTML standard)], it should be possible to apply a
syntactic transformation to the contents in such a way that the script
contents become safe ''without'' semantically altering the script.
If the HTML API can safely and automatically escape the majority of
`SCRIPT` tag contents, it can then be used to for `SCRIPT` tag creation
and has the potential to ''eliminate'' the class of problem from #40737,
#62797, and #63851. It also has the potential to address part of #51159
where SCRIPT tag escaping becomes less of an issue.
----
**JSON**
In JSON `SCRIPT` tags, the transformation is a simple replacement of `<`
with its Unicode escape sequence `\u003C`. This can be applied to the
entire contents of the script or specifically in case-insensitive matches
for `<script` and `</script`.
----
**JavaScript**
JavaScript `SCRIPT` tags are more difficult because the language has
vastly more syntax. Fortunately, there is prior art described in this
[https://sophiebits.com/2012/08/03/preventing-xss-json 2022 blog post
(external)] from React team member Sophie Alpert.
[https://github.com/facebook/react/blob/ae74234eae6ebd62f19190731278e20bc1c37d51/packages
/react-dom-bindings/src/server/ReactFizzConfigDOM.js#L328-L350 It's the
same the JavaScript `SCRIPT` tag contents escaping strategy that React
continues to employ today.] In summary, the problematic text `<script` and
`</script` syntactically appear in places where Unicode escape sequences
can be used in the `script` part (Strings, Identifiers, and RegExp
literals). React takes the approach of replacing the `s` character,
resulting in `<\u0073cript` or `</\u0073cript`, completely safe in a
Script tag.
There are a few notable exceptions where the transformed JavaScript has
observably different runtime behavior. These are the only examples I'm
aware of. They're more esoteric parts of the language and the likelihood
of them being used in inline JavaScript with the problematic text
sequences seems an acceptable tradeoff to me to enable cheap, automatic
JavaScript escaping.
[https://developer.mozilla.org/en-
US/docs/Web/JavaScript/Reference/Global_Objects/String/raw#building_an_identity_tag
String.raw] does not process escape sequences.
{{{#!js
'<script>' === '<\u0073cript>'; // true
String.raw`<script>` === String.raw`<\u0073cript>`; // false
}}}
[https://developer.mozilla.org/en-
US/docs/Web/JavaScript/Reference/Template_literals#raw_strings Tagged
templates can also access the raw strings], again a form without
processing escape sequences.
{{{#!js
function taggedCooked( strings ) {
return strings[0];
}
taggedCooked`<script>` === taggedCooked`<\u0073cript>`; // true
function taggedRaw( strings ) {
return strings.raw[0];
}
taggedRaw`<script>` === taggedRaw`<\u0073cript>`; // false
}}}
[https://developer.mozilla.org/en-
US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/source The source
property of RegExp] contains a string representation of the pattern.
JavaScript RegExp support Unicode escape sequences, but the Unicode escape
sequence is not transformed in the source.
{{{#!js
const rPlain = /<script>/;
const rEscaped = /<\u0073cript>/
rPlain.test('<script>'); // true
rEscaped.test('<script>'); // true
rPlain.source === rEscaped.source; // false
rPlain.source; // '<script>'
rEscaped.source; // '<\\u0073cript>'
}}}
Any better JavaScript escaping would likely require a complete JavaScript
parser and much more invasive changes. It would be much more costly to
perform. Even then, I'm not sure that the escaping could be done
faithfully.
`String.raw()` could be split and joined:
{{{#!js
String.raw`<script>` === String.raw`<s` + String.raw`cript>`; true ✅
}}}
Tagged template raw and RegExp source seem much more challenging.
--
Ticket URL: <https://core.trac.wordpress.org/ticket/64419>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list