[wp-trac] [WordPress Trac] #11311: kses converts ampersands to & in post titles, post content, and more

WordPress Trac noreply at wordpress.org
Tue Sep 29 20:14:22 UTC 2015


#11311: kses converts ampersands to & in post titles, post content, and more
-------------------------------+-----------------------------
 Reporter:  Viper007Bond       |       Owner:
     Type:  defect (bug)       |      Status:  new
 Priority:  normal             |   Milestone:  Future Release
Component:  Posts, Post Types  |     Version:  2.9
 Severity:  normal             |  Resolution:
 Keywords:  needs-patch        |     Focuses:  administration
-------------------------------+-----------------------------

Comment (by boonebgorges):

 > The best solution would be to switch to storing this data in unencoded
 form and run an upgrade routine to decode existing data when the change
 happens, but I realise that this is potentially an expensive upgrade. I'm
 not sure how to address that problem.

 I agree that the best system would be to store the data as unencoded, and
 then encode as necessary for display. However:
 a. The upgrade routine is going to be expensive (as you note)
 b. The upgrade routine is going to be plagued with false positives
 c. It's going to break queries for anyone who is currently working around
 this bug by encoding their query terms
 d. In what contexts would we need to encode for display? Currently, all
 content is being encoded on the way out (by virtue of its being encoded in
 the database). Presumably, you'd want to return unencoded content from
 low-level functions like `get_post_meta()` and `get_term_by()`. But this
 is going to cause compatibility problems with people who are expecting
 encoded data. And maintaining a whitelist of places where output should be
 encoded is going to be most unfun.

 I'm all for fixing this stuff over the long run, but in the short to
 medium term, we should face facts: we store encoded data, so our query
 functions ought to match encoded data too. In the appropriate low-level
 functions (`get_metadata()`, `get_term_by()`, and so forth), we should
 encode using the same filters used to encode on the way in. For example,
 for terms, we'd use `sanitize_term_field()` (see #24354). I don't think
 that this will cause backward compatibility problems: for all the affected
 areas, there's no way to get an unencoded `&` into the database, so
 existing queries for `Foo & Bar` are failing anyway.

 What do johnbillion and others think?

--
Ticket URL: <https://core.trac.wordpress.org/ticket/11311#comment:19>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list