[wp-trac] [WordPress Trac] #64463: XML Escape Codes Applied to RSS and Changes Post Content

WordPress Trac noreply at wordpress.org
Wed Dec 31 22:41:27 UTC 2025


#64463: XML Escape Codes Applied to RSS and Changes Post Content
--------------------------+------------------------------
 Reporter:  chiarella86   |       Owner:  (none)
     Type:  defect (bug)  |      Status:  new
 Priority:  normal        |   Milestone:  Awaiting Review
Component:  General       |     Version:  6.8.3
 Severity:  normal        |  Resolution:
 Keywords:                |     Focuses:
--------------------------+------------------------------
Description changed by sabernhardt:

Old description:

> WordPress incorrectly second-guesses the user and changes characters from
> the posts when they go into RSS feeds. This creates a problem for dummy
> apostrophe, dummy quotation mark, and three dots. You want these to
> display as is in all situations. Publishing will use “these” instead of
> "these," but more importantly, no American English style guide ever uses
> the ellipsis as its own character in publishing. The Associated Press
> (AP) and most newspapers use three dots or three periods. They do *not*
> use the dedicated ellipsis character. Some fonts render these the same,
> but some do not. (Chicago and most books use three dots separated by non-
> breaking spaces.)
>
> Many people type on ASCII keyboards and have read much less in books than
> on screens, which means that the differences between various punctuation
> marks are lost in informal contexts—but not the formal ones.
>
> ASCII  Non-ASCII
> ""      “”
> ''      ‘’
> ...     … (combined ellipsis as one character)
> -       –—−
>
> Correct Incorrect
> “”      ""
> ‘’      ''
> ...     … (combined ellipsis as one character)
> -–—−    *Dependent on context*
>
> WordPress *does* handle the various dashes correctly. (I.e., WordPress
> does not change from one dash to another. The correct usage is up to the
> author. WordPress does *not* second-guess you.) I only include the
> various dashes to make a point about how these differences are subtle or
> invisible to some and glaring to others.
>
> The differences between the four dashes are tricky to spot unless you
> have lots of editing experience, it can be difficult to tell the
> difference between the hyphen -, the en dash –, the em dash —, and the
> negative/minus sign −. All four are valid in different contexts.
>
> Hyphen:  *4-6* is “four-six.”
> En dash: *4–6* is “(from) four to six.”
> Em dash: *4—6* is “(I think that we have) four; six (is also possible).”
> Minus:   *4—6* is “four minus six,” and —6 is “negative six.”
>
> Obviously, these look interchangeable across different fonts. If I see
> *4-6* on a page of radio slang, I assume *four-six*. If I see *4-6 p.m.*,
> then I assume from *four to six p.m.* However, if I am reading something
> longer, with sentences, and I see *4-6 p.m.* with a noticeably small
> hyphen that is the same length as the hyphen in *dot-less* or the breaks
> at the end of a line, then I know what is meant, but I roll my eyes and
> think less of the editor and publisher. It is little different from
> seeing a book’s title read as *From See to Shining See*.
>
> Similarly, ... and … are not the same thing. If I see … in an American
> newspaper, then it is simply a punctuation error.
>
> When it comes to the dummy quotation marks, it looks bad enough to write
> "word" instead of “word,” but WordPress converts this to ”word,” which is
> ridiculous. Similarly, someone may hastily write the following: *She
> said, "Get yourself out of that 'funk' you are in.”* which becomes *She
> said, ”Get yourself out of that ’funk’ you are in.”*
>
> I will never have a reason to compare "" and “” or '' and ‘’ in a post,
> but the behavior is buggy, to say the least.
>
> To make the difference clear, I have attached a PDF of a page with a font
> that illustrates the differences.
>
> This affects the display on podcast readers that parse the information in
> the tags. The incorrectness is fairly objective. The system takes it upon
> itself to substitute characters that may look alike, but that would be
> like a system converting every capital A to capital alpha (αλφα), just
> because *A* and *Α* look similar.

New description:

 WordPress incorrectly second-guesses the user and changes characters from
 the posts when they go into RSS feeds. This creates a problem for dummy
 apostrophe, dummy quotation mark, and three dots. You want these to
 display as is in all situations. Publishing will use “these” instead of
 "these," but more importantly, no American English style guide ever uses
 the ellipsis as its own character in publishing. The Associated Press (AP)
 and most newspapers use three dots or three periods. They do *not* use the
 dedicated ellipsis character. Some fonts render these the same, but some
 do not. (Chicago and most books use three dots separated by non-breaking
 spaces.)

 Many people type on ASCII keyboards and have read much less in books than
 on screens, which means that the differences between various punctuation
 marks are lost in informal contexts—but not the formal ones.
 {{{
 ASCII  Non-ASCII
 ""      “”
 ''      ‘’
 ...     … (combined ellipsis as one character)
 -       –—−

 Correct Incorrect
 “”      ""
 ‘’      ''
 ...     … (combined ellipsis as one character)
 -–—−    *Dependent on context*
 }}}
 WordPress *does* handle the various dashes correctly. (i.e., WordPress
 does not change from one dash to another. The correct usage is up to the
 author. WordPress does *not* second-guess you.) I only include the various
 dashes to make a point about how these differences are subtle or invisible
 to some and glaring to others.

 The differences between the four dashes are tricky to spot unless you have
 lots of editing experience, it can be difficult to tell the difference
 between the hyphen -, the en dash –, the em dash —, and the negative/minus
 sign −. All four are valid in different contexts.

 Hyphen:  *4-6* is “four-six.”
 En dash: *4–6* is “(from) four to six.”
 Em dash: *4—6* is “(I think that we have) four; six (is also possible).”
 Minus:   *4—6* is “four minus six,” and —6 is “negative six.”

 Obviously, these look interchangeable across different fonts. If I see
 *4-6* on a page of radio slang, I assume *four-six*. If I see *4-6 p.m.*,
 then I assume from *four to six p.m.* However, if I am reading something
 longer, with sentences, and I see *4-6 p.m.* with a noticeably small
 hyphen that is the same length as the hyphen in *dot-less* or the breaks
 at the end of a line, then I know what is meant, but I roll my eyes and
 think less of the editor and publisher. It is little different from seeing
 a book’s title read as *From See to Shining See*.

 Similarly, ... and … are not the same thing. If I see … in an American
 newspaper, then it is simply a punctuation error.

 When it comes to the dummy quotation marks, it looks bad enough to write
 "word" instead of “word,” but WordPress converts this to ”word,” which is
 ridiculous. Similarly, someone may hastily write the following: *She said,
 "Get yourself out of that 'funk' you are in.”* which becomes *She said,
 ”Get yourself out of that ’funk’ you are in.”*

 I will never have a reason to compare "" and “” or '' and ‘’ in a post,
 but the behavior is buggy, to say the least.

 To make the difference clear, I have attached a PDF of a page with a font
 that illustrates the differences.

 This affects the display on podcast readers that parse the information in
 the tags. The incorrectness is fairly objective. The system takes it upon
 itself to substitute characters that may look alike, but that would be
 like a system converting every capital A to capital alpha (αλφα), just
 because *A* and *Α* look similar.

--

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/64463#comment:3>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list