Anti-U?E Index Chris' Home Page

Passive Web-Site Defense


Overview

In order to put up a decent "passive" web-site defense against address harvesting, you are going to have to make slight changes to your web server, your DNS, and your mailer. The general idea is to:

  1. re-direct web harvesters to lists of bogus addresses that are generated on-the-fly (web server configuration changes required);
  2. ensure that the bogus addresses look just like good ones, at least as far as the web-harvester and spam-mailing programs can tell (DNS configuration changes required); and
  3. slow down the spammer's mailing operation when they do try to mail to the bogus addresses (SMTP server or mailer configuration changes required).

All of this may be fairly difficult or even impossible when using some operating systems, web servers, or mail servers. However, it turns out to be quite easy on the configuration which I have (Apache web server, SMTPD mailer front-end).

Note: I can't claim ownership of these ideas; almost all of them were originated by others. I'm just publishing a sample implementation.

Important Note: If you do make use of these ideas, please customize your scripts and names (e.g., use something other than "/laughing-place", re-word the HTML output of the bogus address generator, etc.). If lots of people use these scripts, and the output is essentially identical, then spamware writers will change their code to recognize this stuff and bypass it. If there are no consistent patterns to the output, then they won't be able to avoid it easily.


Harvester Re-Direction

This site's web-harvester re-direction has two parts:

  1. Many popular web-harvesters are identified by their "HTTP_USER_AGENT" signature.
  2. Other harvesters are taken in by a nearly-invisible link at the bottom of all web pages.

Web harvesters are often smart enough to avoid CGI-BIN programs. So the implementation (which I got from C. Brabec) is intended to make a script look like a normal web page. This is done with an Apache rewriting rule, in the "httpd.conf" file:

# Redirect various web-harvesters to POISON program
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon   [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro  [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_AGENT} ^Crescent      [OR]
RewriteCond %{HTTP_USER_AGENT} ^CherryPicker  [OR]
RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO     [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebBandit     [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector [OR]
RewriteCond %{REQUEST_URI}     ^/laughing-place
RewriteRule ^.*$ /cgi-bin/killspam.pl [L,T=application/x-httpd-cgi]

The rewriting rule takes the signature of spammer's web-harvesters (at least those which would not also rule out some browsers) and no matter what page they reference they get the output of "killspam.pl". The line for "laughing-place" also re-directs any access to a file within that (non-existent) directory.

The code for "killspam.pl" is available for download here. You'll see that it generates a list of bogus addresses, and also includes links to "other" lists in the non-existent "/laughing-place" directory (which will be another run of the same address generator).

On each real web page on my site, I include an invisible link to that directory. Actually this is done with a Server-Side Include that is placed in every web page. Each web page on my site has this at the bottom:

<!--#exec cmd="/cgi-bin/tail.pl"-->
</BODY>
</HTML>
Among the code in "tail.pl" is:
print <<END_OF_PRINT
<P>
<CENTER>
<A HREF="/laughing-place/bait.html"><IMG SRC="/images/tiny.gif"
        ALT="Do not follow this link" WIDTH=1 HEIGHT=1 BORDER=0></A>
</CENTER>
END_OF_PRINT
    ;

The advantage of the server-side-include is that there is one central location where changes can be made. You don't have to edit every HTML file on your site in order to get a global change to the HTML that your web server hands out. (For example: if I chose to rename "laughing-place" to something else, I would have to make the change in "tail.pl" instead of having to make it in every HTML file on the web server.)

The link is essentially invisible because it is a one-pixel-square transparent .GIF file with no border (you're welcome to copy the image from here). Users wouldn't see the link to click on it, but robot harvesters will follow it.

Just to be nice, I disallow access to "/laughing-place" by adding the following lines to "robots.txt" (in the root HTML-document directory):

User-agent: *
Disallow: /laughing-place

Well-behaved robots pay attention to the restrictions in "/robots.txt" and will not waste time in my spam-trap area. Spammer's address-harvesters are generally poorly-behaved, and will pay the price for ignoring those restrictions.


DNS

This is actually the simplest change, but it requires that one be in charge of DNS for their own domain. If you look at "killspam.pl", you'll see that my version of the script generates addresses of the form "<junk>@bulk.stassen.com" and "<junk>@<garbage>.bulk.stassen.com".

In order for MX-resolving spam-sending programs to think that these randomly-generated domains are valid, they don't have to resolve to an IP address but they do have to have an MX (Mail eXchanger) record. The following lines were added to the domain's file take care of that:

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; BULK and *.BULK are spam-traps
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
bulk        IN  MX  0   mail.stassen.com.
*.bulk      IN  MX  0   mail.stassen.com.

Those lines cause the DNS server to claim that there is an MX record for "bulk.stassen.com" or any sub-domain of it (e.g., "foo.bulk.stassen.com" or "foo.bar.bulk.stassen.com"). This will direct all E-mail from the bogus addresses to my real mail server. In the next (and final) section, we'll fix the mail server to properly handle the unwanted E-mail.


Mailer

The configuration changes described above have forced spammers' address-harvesters to swallow a large number of randomly-generated bogus E-mail addresses. We have modified DNS so that the spammers' programs can't tell that the addreses are bad. Unfortunately, the side effect of those changes is to direct a load of unwanted spam at our mail server. (However, it is the only ethical way to go; it wouldn't be fair to direct unwanted spam at others' mail servers.)

In the final step, we deal with the unwanted spam. With SMTPD, it is trivial to reject it during the SMTP conversation (meaning that the message never actually gets to this site). The bogus addresses are of the form "<junk>@bulk.stassen.com" and "<junk>@<garbage>.bulk.stassen.com". And so we add the folloing text to the SMTPD configuration file:

##############################################################################
#                       TAR-PITTING FOR SPAMMERS
##############################################################################
noto_delay:ALL:ALL:*@bulk.stassen.com *@*.bulk.stassen.com:552 Bad spammer! %F (%H [%I])

We use the "noto_delay" directive so that each individual recipient is rejected -- and there is a 30-second (configuratable at compile-time) pause before SMTPD delivers each response. Our site and users will never be bothered with the spam that is generated.


Anti-U?E Index Chris' Home Page