[wp-trac] [WordPress Trac] #57190: public-api OEmbed effectively double-encodes entities in title

WordPress Trac noreply at wordpress.org
Wed Nov 23 22:20:53 UTC 2022


#57190: public-api OEmbed effectively double-encodes entities in title
--------------------------+-----------------------------
 Reporter:  stiiin        |      Owner:  (none)
     Type:  defect (bug)  |     Status:  new
 Priority:  normal        |  Milestone:  Awaiting Review
Component:  Embeds        |    Version:
 Severity:  minor         |   Keywords:
  Focuses:                |
--------------------------+-----------------------------
 I just came across this post on Mastodon:
 https://infosec.exchange/@ellent@mastodon.nl/109395089902785785 . The post
 contains a link, for which Mastodon generated a link preview. As you may
 note, the title of the link preview contains two double-encoded entities:
 ’ and  .

 I believe Mastodon ultimately consulted [https://public-
 api.wordpress.com/oembed/?format=xml&url=https%3A%2F%2Fellentimmer.com%2F2022%2F11%2F23
 %2Fvan-wie-is-die-website%2F&for=wpcom-auto-discovery the public-
 api.wordpress.com endpoint to generate OEmbed for the linked article]. The
 OEmbed data contains the following title element:

 {{{
 <title><![CDATA[Van wie is die website? Wat bv’s moeten vermelden op
 hun website]]></title>
 }}}

 Per the [https://www.w3.org/TR/REC-xml/#dt-cdsection definition of CDATA
 in the XML syntax]: "Within a CDATA section, only the CDEnd string is
 recognized as markup, so that left angle brackets and ampersands may occur
 in their literal form[.]"

 As such, I believe that the public-api should've either broken those
 character entities out of CDATA section, or decode the character entities
 into their corresponding Unicode codepoints.

 Aside from security considerations for the public-api itself (for example,
 XML injection), note that either approach may negatively effect the
 security of applications that depend on the output of the public-api. The
 former approach may trip up broken (applications of) XML parsers and
 negatively affect availability, and that the latter may expose CWE-174
 (Double Decoding of the Same Data) vulnerabilities in applications to
 exploitation.

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/57190>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list