[spam-stopper] User-agent required when submitting spam?

Sat Mar 3 23:10:32 UTC 2012

On 02/03/12 20:48, james young wrote:
> Yes.
>
> That question reminds me to ask about proxies, when they report the IP
> making the request; should we pass that along?  It's possibly a useful
> signal, but it's also not verifiable.  But this is perhaps a tangent.
>
> -James
>
As far as MediaWiki is concerned, there are two or three kinds of proxies:

1) Squid or Varnish reverse cache as part of the site itself. These are
deployed to store complete, rendered pages for anon-IP visitors to
high-volume MediaWiki sites so that the CPU-intensive wikitext->HTML
conversion process isn't repeated every time someone views a page. These
will be listed in $wgSquidServers or $wgSquidServersNoCache - they
normally keep the cached content until the upstream MediaWiki/Apache
server sends a PURGE message to indicate a page has been changed.
Non-routable addresses like ::1 or 127.0.0.1 may implicitly be
recognised as $wgSquidServersNoCache in current MediaWiki; this keeps
them out of [[special:recentchanges]] and instead uses the
X-Forwarded-For: header as the user's IP. There is documentation on
[[mw:manual:squid caching]] and [[mw:manual:varnish caching]] to explain
the reverse-proxy as web accelerator approach. You do *not* want to
blacklist 127.0.0.1 at any time for any reason.

2) Rarely, known/trusted proxies which are part of individual ISP's for
use by that ISP's users. AOL in particular is infamous for having all of
its users appear behind a limited number of proxy-server IP addresses at
random, but the unique address of the user is in the X-Forwarded-For
header and there is an [[mw:Extension:TrustedXFF]] on MediaWiki.org
which recognised these. This is not part of the standard MediaWiki core
code but is deployed to the Wikipedia (WMF) servers. The rest of us are
quite content to block all of AOL or any other provider which allows
abuse and hides it behind a proxy IP.

3) Every other proxy on the outside Internet, many of which may be
misconfigured or open in such a way as to allow spammers to hide behind
them. If a proxy (of which the local wiki's siteadmin is not aware) is
used to submit spam, standard procedure is to block the open proxy's IP
on-wiki or globally on the assumption that any spammer will continue to
abuse these servers to submit linkspam. In other words, the mere use of
an open proxy itself is reason enough to bitbucket a submission as
probable spam - either by admins manually blacklisting these after they
are abused or by extensions such as [[mw:extension:AbuseFilter]]
(autoblocks page blanking IP's and other abuse) or [[mw:extension:Check
Spambots]] (imports blacklist data from stopforumspam, honeypot and a
few DNS BL's). While a [[mw:extension:CheckUser]] lookup by an
administrator will pull all header addresses (including X-Forwarded-For)
from the logs, just about anything else in MediaWiki handles these by
presuming the open proxy *is* the anon-IP spam submitter and banning
everything behind it.

If you're writing an extension wrapper like
[[mw:extension:AkismetKlik]], the IP reported to you will be already
processed in such a way as to replace proxies (1) and maybe (2) above
with the end-user's IP address while leaving (3) as-is. An untrusted
proxy may have multiple spam users, may be reporting incorrect
X-Forwarded-For headers or not be reporting end-user IPv4 or IPv6
addresses at all. By design, if your open proxy sends spam, it and
everything behind it gets banned in MediaWiki.