[wp-hackers] Blocking SEO robots

Bart Schouten list at xenhideout.nl
Tue Sep 2 01:20:28 UTC 2014


On Wed, Aug 6, 2014 at 9:26 PM, Daniel <malkir at gmail.com> wrote:

> Set up a trap. A link hidden by CSS on each page that if hit, the IP 
> gets blacklisted for a period of time. No human will ever come across 
> the link unless they're digging. No bot actually renders the entire page 
> out before deciding what to use.


This is awesome stuff.

Personally I am annoyed by the pollution of page hit (visitor) statistics. 
So the same trigger cq. trap could be used to filter out those. At this 
point I am probably not allowed by my host in any way to start blocking 
IPs at the Apache level (even that) but it is easy enough to implement it 
in PHP at least for my purposes.

I guess it should then just be the first link on every page, which is 
currently a "home" link. It could be something ridiculously funny like 
geteatenalive.php but that might also tempt some human diggers :P.

Alright let's see what it does. I have this table:

CREATE TABLE wordpr_trap_victims (
   id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
   ip_address VARCHAR(15) NOT NULL,
   host_name VARCHAR(255),
   time TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
   user_agent VARCHAR(255),
   referer VARCHAR(2000),
   INDEX(ip_address)
);

My site has a /norobot/deathtrap.php that takes a base64 encoded parameter 
"r" with the referer field of the page that generated the link to this 
script. Then, if following this link is the first action the spider takes 
after getting to my site, I should get the original referer field pointing 
to the originating crawl script. There is one crawler (semalt.com) that 
constantly indexes my site or whatever it does. I'm not sure what it does 
but it is a search rankings scheme kinda thing. It uses a zillion 
different aliases like 905.semalt.com and 512.semalt.com and so on.

The script just silently adds a row to the table and then redirects to the 
front page.

Within a few days I should know if any crawler actually follows that link.

This will be the motto:

//You may think you are a spider, but to me, you're just a fly. This is my 
web... and *I* am the spider.//

I just have the link hidden with inline CSS but that shouldn't make too 
much of a difference....

Let's see if this will be any fun :D.

Kudos, Bart


More information about the wp-hackers mailing list