[wp-hackers] Blocking SEO robots

Daniel malkir at gmail.com
Thu Aug 7 04:26:22 UTC 2014


Set up a trap. A link hidden by CSS on each page that if hit, the IP gets
blacklisted for a period of time. No human will ever come across the link
unless they're digging. No bot actually renders the entire page out before
deciding what to use.


On Wed, Aug 6, 2014 at 5:31 AM, Jeremy Clarke <jer at simianuprising.com>
wrote:

> On Wednesday, August 6, 2014, David Anderson <david at wordshell.net> wrote:
>
> > The issue's not about how to write blocklist rules; it's about having a
> > reliable, maintained, categorised list of bots such that it's easy to
> > automate the blocklist. Turning the list into .htaccess rules is the easy
> > bit; what I want to avoid is having to spend long churning through log
> > files to obtain the source data, because it feels very much like
> something
> > there 'ought' to be pre-existing data out there for, given how many watts
> > the world's servers must be wasting on such bots.
>
>
> The best answer is the htaccess-based blacklists from PerishablePress. I
> think this is the latest one:
>
> http://perishablepress.com/5g-blacklist-2013/
>
> He uses a mix of blocked user agents, blocked IP's and blocked requests
> (i.e /admin.php, intrusion scans for other software). He's been updating it
> for years and it's definitely a WP-centric project.
>
> In the past some good stuff has been blocked by his lists (Facebook spider
> blocked because it had an empty user agent, common spiders used by
> academics were blocked) but that's bound to happen and I'm sure every UA
> was used by a spammer at some point.
>
> I run a ton of sites on my server so I hate the .htaccess format (which is
> a pain to implement alongside wp+super cache rules). If I used multisite it
> would be less of a big deal. Either way, know that you can block UA's for
> all virtual hosts if that's relevant.
>
> Note that ip blocking is a lot more effective at the server level because
> blocking with Apache still uses a ton of resources (but at least no MySQL
> etc). On Linux an iptables based block is much more effective.
>
>
>
>
> --
> Jeremy Clarke
> Code and Design • globalvoicesonline.org
> _______________________________________________
> wp-hackers mailing list
> wp-hackers at lists.automattic.com
> http://lists.automattic.com/mailman/listinfo/wp-hackers
>



-- 
-Dan


More information about the wp-hackers mailing list