[wp-trac] [WordPress Trac] #64463: XML Escape Codes Applied to RSS and Changes Post Content
WordPress Trac
noreply at wordpress.org
Wed Dec 31 22:41:27 UTC 2025
#64463: XML Escape Codes Applied to RSS and Changes Post Content
--------------------------+------------------------------
Reporter: chiarella86 | Owner: (none)
Type: defect (bug) | Status: new
Priority: normal | Milestone: Awaiting Review
Component: General | Version: 6.8.3
Severity: normal | Resolution:
Keywords: | Focuses:
--------------------------+------------------------------
Description changed by sabernhardt:
Old description:
> WordPress incorrectly second-guesses the user and changes characters from
> the posts when they go into RSS feeds. This creates a problem for dummy
> apostrophe, dummy quotation mark, and three dots. You want these to
> display as is in all situations. Publishing will use “these” instead of
> "these," but more importantly, no American English style guide ever uses
> the ellipsis as its own character in publishing. The Associated Press
> (AP) and most newspapers use three dots or three periods. They do *not*
> use the dedicated ellipsis character. Some fonts render these the same,
> but some do not. (Chicago and most books use three dots separated by non-
> breaking spaces.)
>
> Many people type on ASCII keyboards and have read much less in books than
> on screens, which means that the differences between various punctuation
> marks are lost in informal contexts—but not the formal ones.
>
> ASCII Non-ASCII
> "" “”
> '' ‘’
> ... … (combined ellipsis as one character)
> - –—−
>
> Correct Incorrect
> “” ""
> ‘’ ''
> ... … (combined ellipsis as one character)
> -–—− *Dependent on context*
>
> WordPress *does* handle the various dashes correctly. (I.e., WordPress
> does not change from one dash to another. The correct usage is up to the
> author. WordPress does *not* second-guess you.) I only include the
> various dashes to make a point about how these differences are subtle or
> invisible to some and glaring to others.
>
> The differences between the four dashes are tricky to spot unless you
> have lots of editing experience, it can be difficult to tell the
> difference between the hyphen -, the en dash –, the em dash —, and the
> negative/minus sign −. All four are valid in different contexts.
>
> Hyphen: *4-6* is “four-six.”
> En dash: *4–6* is “(from) four to six.”
> Em dash: *4—6* is “(I think that we have) four; six (is also possible).”
> Minus: *4—6* is “four minus six,” and —6 is “negative six.”
>
> Obviously, these look interchangeable across different fonts. If I see
> *4-6* on a page of radio slang, I assume *four-six*. If I see *4-6 p.m.*,
> then I assume from *four to six p.m.* However, if I am reading something
> longer, with sentences, and I see *4-6 p.m.* with a noticeably small
> hyphen that is the same length as the hyphen in *dot-less* or the breaks
> at the end of a line, then I know what is meant, but I roll my eyes and
> think less of the editor and publisher. It is little different from
> seeing a book’s title read as *From See to Shining See*.
>
> Similarly, ... and … are not the same thing. If I see … in an American
> newspaper, then it is simply a punctuation error.
>
> When it comes to the dummy quotation marks, it looks bad enough to write
> "word" instead of “word,” but WordPress converts this to ”word,” which is
> ridiculous. Similarly, someone may hastily write the following: *She
> said, "Get yourself out of that 'funk' you are in.”* which becomes *She
> said, ”Get yourself out of that ’funk’ you are in.”*
>
> I will never have a reason to compare "" and “” or '' and ‘’ in a post,
> but the behavior is buggy, to say the least.
>
> To make the difference clear, I have attached a PDF of a page with a font
> that illustrates the differences.
>
> This affects the display on podcast readers that parse the information in
> the tags. The incorrectness is fairly objective. The system takes it upon
> itself to substitute characters that may look alike, but that would be
> like a system converting every capital A to capital alpha (αλφα), just
> because *A* and *Α* look similar.
New description:
WordPress incorrectly second-guesses the user and changes characters from
the posts when they go into RSS feeds. This creates a problem for dummy
apostrophe, dummy quotation mark, and three dots. You want these to
display as is in all situations. Publishing will use “these” instead of
"these," but more importantly, no American English style guide ever uses
the ellipsis as its own character in publishing. The Associated Press (AP)
and most newspapers use three dots or three periods. They do *not* use the
dedicated ellipsis character. Some fonts render these the same, but some
do not. (Chicago and most books use three dots separated by non-breaking
spaces.)
Many people type on ASCII keyboards and have read much less in books than
on screens, which means that the differences between various punctuation
marks are lost in informal contexts—but not the formal ones.
{{{
ASCII Non-ASCII
"" “”
'' ‘’
... … (combined ellipsis as one character)
- –—−
Correct Incorrect
“” ""
‘’ ''
... … (combined ellipsis as one character)
-–—− *Dependent on context*
}}}
WordPress *does* handle the various dashes correctly. (i.e., WordPress
does not change from one dash to another. The correct usage is up to the
author. WordPress does *not* second-guess you.) I only include the various
dashes to make a point about how these differences are subtle or invisible
to some and glaring to others.
The differences between the four dashes are tricky to spot unless you have
lots of editing experience, it can be difficult to tell the difference
between the hyphen -, the en dash –, the em dash —, and the negative/minus
sign −. All four are valid in different contexts.
Hyphen: *4-6* is “four-six.”
En dash: *4–6* is “(from) four to six.”
Em dash: *4—6* is “(I think that we have) four; six (is also possible).”
Minus: *4—6* is “four minus six,” and —6 is “negative six.”
Obviously, these look interchangeable across different fonts. If I see
*4-6* on a page of radio slang, I assume *four-six*. If I see *4-6 p.m.*,
then I assume from *four to six p.m.* However, if I am reading something
longer, with sentences, and I see *4-6 p.m.* with a noticeably small
hyphen that is the same length as the hyphen in *dot-less* or the breaks
at the end of a line, then I know what is meant, but I roll my eyes and
think less of the editor and publisher. It is little different from seeing
a book’s title read as *From See to Shining See*.
Similarly, ... and … are not the same thing. If I see … in an American
newspaper, then it is simply a punctuation error.
When it comes to the dummy quotation marks, it looks bad enough to write
"word" instead of “word,” but WordPress converts this to ”word,” which is
ridiculous. Similarly, someone may hastily write the following: *She said,
"Get yourself out of that 'funk' you are in.”* which becomes *She said,
”Get yourself out of that ’funk’ you are in.”*
I will never have a reason to compare "" and “” or '' and ‘’ in a post,
but the behavior is buggy, to say the least.
To make the difference clear, I have attached a PDF of a page with a font
that illustrates the differences.
This affects the display on podcast readers that parse the information in
the tags. The incorrectness is fairly objective. The system takes it upon
itself to substitute characters that may look alike, but that would be
like a system converting every capital A to capital alpha (αλφα), just
because *A* and *Α* look similar.
--
--
Ticket URL: <https://core.trac.wordpress.org/ticket/64463#comment:3>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list