[wp-trac] [WordPress Trac] #22530: garbage query strings on URLs are not sanitized or removed

WordPress Trac noreply at wordpress.org
Wed Nov 21 15:52:50 UTC 2012


#22530: garbage query strings on URLs are not sanitized or removed
-----------------------------+--------------------------
 Reporter:  rawalex          |       Type:  defect (bug)
   Status:  new              |   Priority:  normal
Milestone:  Awaiting Review  |  Component:  General
  Version:  3.4.2            |   Severity:  critical
 Keywords:  needs-patch      |
-----------------------------+--------------------------
 Here is an interesting problem I ran into, a bug / feature that appears to
 be used by malicious people to cause Google to see your site as full of
 duplicate content.

 If you visit a wordpress site, and add a garbage query string to the end
 of the URL, that garbage gets carried forward.  Example:

 yourblog.here/page/2?ssdlfkjsdlkfjsdfs

 When you scroll down, the "previous" and "next" links will automatically
 carry that query string forward.

 Normally, this would not be a big issue.  However, some people appear
 intent on specifically creating these sorts of links to wordpress sites,
 and Googlebot is finding those links on remote sites.  Those links are
 followed, and then the "previous - next" situation perpetuates the problem
 through every page on the site.  If you have 1000 posts, at 10 per page,
 Google just indexed 100 duplicate content pages.

 So the bug is the following:

 Passed query strings need to be sanitized, and junk removed - there is no
 reason to pass it on.  In the case of a junk passed string, there should
 be an http 301 or 302 reply and the user / bot redirected to the proper
 page without the query string.

 Further, query strings should not be perpetuated forward through the
 "previous - next" links on the pages unless they are relevant to that page
 change.  As an example, a valid search string might be worth moving
 forward with.  Other passed items may not be worth carrying forward.

 Potentially, any unsanitized input accepted in a query is a vector for
 other attacks.  Having that query carry forward is a real issue.  As an
 example, full select * from queries are not accepted and not dealt with,
 and perpetuated forward.  No, they are not currently actually causing
 anything to happen, but a failure to sanitize these inputs suggests a
 vector for a future attack, such as an input overflow or similar.

-- 
Ticket URL: <http://core.trac.wordpress.org/ticket/22530>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software


More information about the wp-trac mailing list