[wp-edu] Uploads folder content indexed in Google?
joseph.ugoretz at mhc.cuny.edu
Tue Sep 15 22:09:07 UTC 2015
Sorry I can’t give anything useful for the google questions--except that Google ALWAYS knows where everything is and can return it in a search result if they want to! :-)
But for the more basic one of protecting uploaded files, I blogged about the very problem a couple years ago.
Daniel Bachhuber (who may still be on this list?) directed me to a good solution, the WP Document Revisions plugin.
That at least gives you the option of protecting uploaded files. It doesn’t by default protect all files, but it does let you be a bit more confident than just security by obscurity for the files that do need to be kept private.
Joseph Ugoretz, PhD
Teaching, Learning and Technology
Macaulay Honors College
City University of New York
On September 15, 2015 at 5:59:08 PM, Ben Bakelaar (bakelaar at rutgers.edu<mailto:bakelaar at rutgers.edu>) wrote:
Hello all, it appears we have had some of the files on our Wordpress network indexed in Google search results. I had assumed security through obscurity here, but it appears I was wrong.
Our network runs sites as sub-directories, and we also use domain mapping for some of them. I haven’t quite figured out how yet, but one of the mapped domains (xyz, not root.url.com<http://root.url.com>) which points to site A has shown up in search results with absolute paths to files in a completely different site B (which is actually a sub-dir site, not masked). And they load just fine – this must be an unanticipated quirk of DNS records + the Wordpress code that routes requests.
So we have URLs like xyz.domain/wp-content/uploads/sites/x/xxxx/xx/filename.doc coming up in results! Eek! I have already started the removal requests via Google Webmaster Tools. Again no explanation yet for how these URLs were located by the search engines, but I’m working on possible theories.
Aside from getting to the bottom of this, I’m trying to figure out the best way to block this from happening in the future. Apache .htaccess rules are one option. Robots.txt could be another? Has anyone run into this issue before, and what have you done as a solution? I’m a little surprised this isn’t addressed “in code”. There are many plugins that allow uploads, this is a desired and supported user behavior by default. But there are no conceivable use cases I can think of where those uploads should be able to be indexed by bots.
Could I simply place robots.txt in the root of the WP codebase, and tell it to avoid indexing ALL files under /wp-content? Would that cover all the various access cases with direct-linked files (like graphics), domain masking/mapping, etc.? And to fully prevent opening any uploads from outside the university network (as a decent but arbitrary perimeter), can I do the same with .htaccess or do I have to make dozens of .htaccess files per /wp-content/uploads/sites/X – in each little sub-directory?
BEN BAKELAAR | IT Services
School of Communication and Information
Rutgers, The State University of New Jersey
wp-edu mailing list
wp-edu at lists.automattic.com
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the wp-edu